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ABSTRACT 

We describe the VISTA Science Archive (VSA) and its first public release of data from five of the six VISTA Public Surveys. The 
VSA exists to support the VISTA Surveys through their lifecycle: the VISTA Public Survey consortia can use it during their quality 
control assessment of survey data products before submission to the ESO Science Archive Facility (ESO SAF); it supports their 
exploitation of survey data prior to its publication through the ESO SAF; and, subsequently, it provides the wider community with 
survey science exploitation tools that complement the data product repository functionality of the ESO SAF. 

This paper has been written in conjunction with the first public release of public survey data through the VSA and is designed to 
help its users understand the data products available and how the functionality of the VSA supports their varied science goals. We 
describe the design of the database and outline the database-driven curation processes that take data from nightly pipeline-processed 
and calibrated FITS files to create science-ready survey datasets. Much of this design, and the codebase implementing it, derives from 
our earlier WFCAM Science Archive (WSA), so this paper concentrates on the VISTA-specific aspects and on improvements made 
to the system in the light of experience gained in operating the WSA. 
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One of the clearest trends in observational astronomy over the 
past two decades has been the rise in the importance of sys- 
tematic sky surveys. When coupled with good archives, sky sur- 
veys provide homogeneous, re-usable data products facilitating 
a range of research programmes and, in particular, enabling the 
large-scale statistical analyses required for many of the most im- 
portant science goals in modern astronomy. 

Probably the most prominent of this gene ration of surveys 
has been the Sloan Digital Sky Sur vey (SDSS;lYork et alj|2000t) 
and its Sky Server archive system dSzalav et alT 2002 ) demon- 
strated the power of a survey archive based on a Relational 
Data Base Management System (RDBMS). Such systems can 
offer astronomers the ability to pose powerful analytical queries 
in Structured Query Language (SQL) against seamless, survey- 
wide source catalogues, thereby enabling survey science that 
would be impossibly cumbersome for an astronomer provided 
with nothing more than a repository of matching image and cat- 
alogue files for each of the thousands of pointings making up the 
survey dataset. 

The success of the SDSS SkyServer was a strong influence 
on the design of the archive component of the VISTA Data 
Flow System (VDFS: lEmerson et al.l[2004h . VISTA, the Visible 
an d Infrared Survey Te lescope for Astronomy, is described 
bv lEmerson etldl J2006). VDFS was designed as a two-phase 
project, with an initial goal of supporting near-infrared sur- 
veys to be conduc ted with the Wide Field CAMera (WFCAM; 
ICasali et al.ll2007l) on the UK Infrared Telescope (UKIRT) and 
an ultimate objective of supporting surveys with VISTA. Within 
VDFS, the Cambridge Astronomy Survey Unit (CASU) run a 
night-by-night data processing pipeline, with the Wide-Field 



Astronomy Unit (WFAU) in Edinburgh generating further data 
products and providing science archive facilities. 

The first-generation VDFS archi ve - the WFCAM Sc ience 
Archive (WSA) - is described by lHamblv et all (120081) and 
serves catalogue and i mage data from the U K Infrared Deep 
Sky Survey (UKIDSS; lLawrence ei~aT1l2007l) . as well as other, 
RL-mode, data taken with WFCAM. More than 1000 users 
are registered for authenticated access to proprietary UKIDSS 
data through the WSA, and it supports anonymous access by 
a larger community once the data are public. The phased ap- 
proach adopted within the VDFS ensured that the design and de- 
velopment of the WSA progressed with scalability to the larger 
data volumes of VISTA kept explicitly in mind, along with the 
likel y scientific usage p atterns of the VISTA surveys. For exam- 
ple, ICross et aD d2009l) describe the enhancements made to the 
WSA database schema to support time-series analysis of multi- 
epoch data, which was prototyped using observations from the 
UKIDSS Deep Extragalactic Survey, but motivated by the re- 
quirements for supporting variability analyses with VISTA. 

The initial scientific programme for VISTA is mostly fo- 
cussed on six ESO Public Surveys which deliver reduced images 
and derived catalogue data products to the ESO Science Archive 
Facility (SAF). Five of the six Public Survey consortia (see §|2]i, 
use the VDFS for the generation of these data products and em- 
ploy the VISTA Science Archive (VSA) to manage their data, 
both for quality assurance analysis and preliminary exploitation 
prior to submission to the ESO SAF and, following its publica- 
tion there, to provide the wider community with sophisticated 
science archive capabilities that complement the data product 
repository functionality of the ESO SAF. 
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1.1. Summary of earlier VDFS work 

Since the design of the VSA is derived so directly from that of 
the well-used WSA, we strongly urge readers who are unfamil- 
iar with the VDFS science archives to read the exis ting papers 
on the WSA (lHamblv et al.ll2008t ICross et al.ll2009h . as well as 
the comprehensive online documentation provided on the VSA 
websitq3, in conjunction with this paper. However, we will sum- 
marise the main points of these papers which are relevant to the 
VSA in this section. 

VISTA and UKIRT/WFCAM take very similar basic data, 
which consists of images obtained by a wide-field near infrared 
imaging array on a ~4m telescope. The two cameras utilise large 
format 2048x2048 pixel infrared detectors: sixteen 0.34 arcsec 
pixel -1 in the case of VISTA, and four 0.4 arcsec pixel -1 in the 
case of WFCAM. Images are taken in 5 broad-band filters (Z, 
Y, J, H, KJK) and several narrow-band filters, within the wave- 
length range 0.8 < A < 2.5^/m, for a range of large surveys 
and smaller P.I. programmes. Large areas can be covered by re- 
peating a basic pattern of images, a square pattern of 4 adjacent 
pawprints in the case of WFCAM. 

Observing time is divided between large surveys (UKIDSS 
and the Campaigns on UKIRT/WFCAM) and smaller Pi-led pro- 
grammes (time awarded by a telescope allocation committee), 
service mode observations for very small projects and special 
projects like director's discretionary time projects and calibra- 
tion work. WFAU ingest all of these data into the WSA but the 
key design features were driven by the requirements for the main 
surveys: the UKIDSS surveys and the then future VISTA Public 
Surveys. 

In VDFS, a single exposure, reduced science frame is desig- 
nated as a normal frame. These frames can be stacked (coadded 
to increase the signal-to-noise) together with small offsets in po- 
sition (a jitter pattern) to reduce the effects of bad pixels: the 
resulting image frame is designated a stack frame. The pixel size 
of WFCAM images is 0.4", but the seeing sometimes is as good 
as 0.5", so the images can be undersampled. To produce criti- 
cally sampled images in the best seeing conditions, a technique 
called micro-stepping was used, where a series of images are in- 
terleaved; adjacent frames that are offset by half or a third of a 
pixel, (in reality a small integer offset is added too to reduce the 
affects of bad pixels) in the x and y directions, to create a 2 x 2 
(leav) image with 0.2" (or 3 x 3 with 0.133") pixels. Several 
leav frames with different jitter positions are stacked together, 
the resulting image frame is designated a leavstack frame. 

A stack {leav or otherwise) is the fundamental science im- 
age, from which catalogues are extracted. If there are multiple 
epochs in the same filter and pointing, a deepstack or deepleav- 
stack may be created to find fainter sources. All of these science 
frames have associated confidence (conf) images that contain the 
relative weighting of each pixel. This includes bad pixel mask- 
ing, the effects of jittering, seeing and exposure time weighting. 

At WFAU, for the purposes of archiving, UKIDSS as a whole 
was referred to as a survey while the different parts of it: the 
Large Angular Survey (LAS), the Galactic Plane Survey (GPS), 
the Deep Extragalactic Survey (DXS) etc. were referred to as 
programmes and Pi-led programmes were referred to as non- 
survey programmes. Since all UKIDSS programmes were re- 
leased together and there was a management structure covering 
the whole of UKIDSS, the distinction between survey and pro- 
gramme was useful and clear. UKIRT divide observations into 
projects which have their own identifiers. We assign projects to 
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programmes based on the metadata identifiers, e.g. in the header, 
the PROJECT keyword u/ukidss/gcs5 is UKIDSS GCS project 
observing set number 5 and is assigned to programmeID=103 in 
the WSA. 

The data in the WSA are stored in a relational database man- 
agement system (RDBMS), which stores data in a set of related 
tables. In an RDBMS, missing values can be handled in a cou- 
ple of ways: by setting them as NULL, or giving them a default 
value. Setting them as NULL has the advantage that it is clear 
that data is missing, but has the disadvantage that all queries 
have to consider NULL values. For example, if you are query- 
ing for objects with a certain colour range, -1.0 <(J-H)< 1.0, 
it would be necessary to include a constraint "and (J-H) is not 
null", which would occur if one or other of the two images had 
not yet been observed or the object was too faint in the J or H 
band for a detection. Our solution is to use default values, but our 
standard default values are large negative values, that are outside 
the normal range of most physical values. The default values for 
each column are given in the schema browser, see below. All of 
the columns in all of the tables are not null, i.e. they have at least 
a default value. 

The pixel data from images is not stored directly in the 
RDBMS as binary large objects (BLOBs), but they are stored as 
flat fi les in multi-extension FITS (MEF; a FITS file. lPence et alj 
2010, with a primary which contains metadata about the whole 
file, and secondary extensions with binary data from each de- 
tector and a header with detector relevent attributes) format, and 
the paths are stored in the archive, along with all of the meta- 
data. The databases are self describing, since all the information 
about the surveys is also stored in the database, in curation ta- 
bles, which are database tables that are designed principally to 
aid the curation of each programme, see § 13.1.51 

The design of the relational database should capture the in- 
herent structure of the data stored. For instance the image meta- 
data can be divided into different groupings, e.g. metadata re- 
lated to the whole MEF, such as observation time, filter, project, 
PI, airmass and metadata related to each detector extension, 
such as sky level, zeropoint, seeing. When designing a relational 
database, we capture the structure in an entity-relation model 
(ERM), see Fig. |4] for an example. An ERM contains entities 
(shown as boxes with rounded corners) which represent a col- 
lection of related data, e.g. primary header data from each FITS 
image or sources in a merged filter catalogue. The relationship 
between entities is represented by lines between them which 
are mandatory (solid line), optional (dotted) and can be one-to- 
one (a single line), one-to-many (a single line that branches into 
three, similar to a crow's foot) or many-to-many (three lines con- 
verge to a single line that branches into three). If the two tables 
share the same primary key a perpendicular line is added across 
the main line. The basic ER Ms for the WSA (whi ch are relevant 
for the VSA) are shown in Ham blv et al.l (120081) and we show 
the multi-epoch ERMs in lCross et al.l (l2009l)~ 

While the database models presented so far could be imple- 
mented in any RDBMS, the WSA and VSA are implemented in 
a commercial software product, Microsoft SQL Server, which 
is suitable for medium to large scale databases. This was also 
the choice that the Sloan Digital Sky Survey team made, which 
heavily influenced our decision. When we implement the data 
model the ERMs are converted into a schema, a set of database 
objects. Most of these objects are tables, with entities in the 
ERM mapping to tables in the schema. The tables hold all the 
data and can be queried via the user interface applications. 

The WSA provides a schema browser, which contains all the 
information about all of the tables in each release. The left hand 
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side gives the user a list of surveys and releases and then the 
tables, functions and views associated with each one. Selecting 
a table gives a description of the table and then the list of at- 
tributes in the table. Each attribute has the name, type (e.g. int, 
real, float), length (in bytes), units, description, default value and 
unified content descriptor. Each table has a primary key, which 
is a single attribute or a combination of attributes in the table that 
between them can cover all unique entries and must be unique. 
For instance, the Multiframe table contains entries for each 
multi-extension FITS image, so the primary key is the multi- 
framelD, whereas in the lasDetection table, entries are ob- 
jects extracted from each extension of a FITS image, and within 
each extraction are given a sequence number, so the primary key 
is (multiframelD, extNum, seqNum). Any constraints with re- 
spect to other tables are listed at the top and all attributes which 
are indexed for fast searches are highlighted. The primary key 
attributes are indexed automatically. Some attributes require a 
more detailed description, and these are linked to a glossary and 
have a symbol like a book beside the attribute name, which can 
be clicked on. 

Other database objects include Views and Functions. Views 
are selections of data from tables already in the database. They 
can be a subset from one table or a superset of many, and are 
queried in the same way as tables, but no extra data is stored. 
These are used, for instance, in the WSA when we give a subset 
of the source tables for some UKIDSS surveys which have data 
taken in all filters. Functions take inputs and do specific calcula- 
tions. We have some which do spherical astronomy calculations, 
give expected magnitude limits, convert ra and dec to a sexages- 
imal string and give names for objects in the IAU convention. 

1.2. Outline of paper 

This paper describes the VSA and its first public release of data 
from the five VDFS-supported Public Surveys. In Section [2] we 
discuss the VISTA telescope and the Public Surveys and com- 
pare to UKIRT-WFCAM, focussing on the essential differences 
that affect the VSA. In Section [3] we provide an overview of 
the VSA, discussing the table structure before we compare the 
VSA to the WSA in Section [4] We discuss changes to the im- 
age metadata, the catalogue parameters and the infrastructure 
in Sections [5}j7] in the VSA compared to the WSA and new 
features that are common to both in Section [8] Section [9] pro- 
vides examples of some of the different types of science queries 
that the VSA supports, while Section [TOl provides details of the 
contents of the first VSA releases of the five VDFS-supported 
VISTA Public Surveys. We summarise this paper and discuss fu- 
ture work in Section [TT] while several appendices provide tech- 
nical details supplementing the main body of the paper. 

2. Overview of VISTA and its Public Surveys 

The Visib le and Infrared Sur vey Telescope for Astronomy 
(VISTA; lEmerson et al.l I2006I) is currently the fastest near- 
infrared survey telescope, with an etendue (area times instanta- 
neous field-of-view) of approximately 6. 5m 2 deg 2 . It has a 4m f/1 
primary mirror, and a 1 ,2 m secondary giving it a 1 .65 de gree di- 
ameter field-of-view (see lEmerson & S utherland 2010a b). The 
VISTA Infra Red CAMera (VIRCAM: iDalton et alJl2010h . has 
16 2048 X 2048 pixel non-buttable Raytheon VIRGO HgCdTe 
detectors and has a quantum efficiency > 80% between 0.9pm 
and 2.4pm. The pixel scale is 0.34" and the instantaneously sam- 
pled field-of-view is 0.6 sq. deg (see Fig. []]). Compared to its 
nearest counterpart, the United Kingdom Infra Red Telescope 



with its Wide Field CAMera (UKIRT-WFCAM; ICasali et al.l 
2007), the survey speed of VISTA is ~ 6 times faster, having 
twice the sensitivity - increased throughput for a similar sized 
telescope - and 3 times the area per pointing. 

ESO's Science Verification for VISTA started at the end 
of 2009 and the main science programme commenced in early 
2010. VISTA's programme initially focuses on six ESO Public 
Surveyfl, nicely complementing the sub-surveys of UKIDSS in 
the northern sky. 

These six surveys are: 

- VHS: the VISTA Hemisphere SurvejQ; 

- VVV : the VISTA Variables in Via Lacteal dSaito et al.l 
120121) : 

- VMC : the VISTA Magellanic Cloud surve>0 (ICioni et al.l 
|20T1 : 

- VIKING: the VISTA Kilo -degree INfrared survey for 
Galaxies dFindlav et al.ll2012h : 

- VIDEO: the VIS TA Deep Extragalactic Objects survejQ 
(Jarvis et al. 1120121): 

- Ultra VISTAFI dMcCracken etai]l2012l) . 

These surveys have a 'wedding cake' arrangement of galac- 
tic/extragalactic surveys (VHS, VIKING, VIDEO, UltraVISTA) 
with different depth/area combinations and two specialised 
stellar astronomy programmes (VVV, VMC), much like the 
UKIDSS surveys. The five surveys supported by the VDFS are 
VHS, VVV, VMC, VIKING and VIDEO. UltraVISTA makes 
use of the CASU pipeline products, but is not currently archiv- 
ing its data in the VSA. 

VISTA data is calibrated on the natural VISTA photometric 
system (see Hodgkin et al. 2012, in preparation). All magnitudes 
(unless designated as AB mag) are on this Vega mag system. 

2. 1 . Differences in the telescope and instrument between 
VISTA & WFCAM 

For detailed descrip tions of VISTA and WFCAM, see 
lEmerson etal] d2006li and ICasali et al] d2007l) respectively. In 
this section we will just discuss the salient differences which 
affect the VSA design compared to the WSA. 

2.1 .1 . The VISTA focal plane: pawprints and tiles 

VISTA is significantly different from UKIRT/WFCAM in sev- 
eral important aspects, which affect image processing and sub- 
sequent archive operations. The most significant differences of 
VISTA to UKIRT/WFCAM are the arrangement of the focal 
plane and the ability VISTA's alt-azimuth mount provides to ob- 
serve the same piece of sky in any orientation with respect to the 
focal plane. 

VISTA has 16 2kx2k Raytheon VIRGO detectors arranged 
in a pawprint with detectors spaced 90% (10.4') of a detector 
apart in the X-direction and 42.5% (4.9') apart in the Y-direction 
(see Fig. [TJ whereas WFCAM has 4 2kx2k Hawaii 2 detectors 
arranged in a pawprint (see Fig. [2]) with the same spacing of 94% 
(12.8') in each direction. 

2 http://www.eso.org/public/teles-instr/surveytelescopes/vista/ 
surveys.html 

3 http ://w w w. ast . cam. ac . uk/~rgm/vhs/ 

4 http://vvvsurvey.org 

5 http ://s tar.herts . ac . uk/~ mcioni/vmc/ 

6 http://star-www.herts.ac.uk/~mjarvis/video/ 

7 http://www.strw.leidenuniv.nl/~ultravista/ 
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Fig.l. The VISTA focal plane showing 16 2k x 2k detec- 
tors with 90% spacing in the x-direction and 42.5% in the y- 
direction. There are also two auto-guider (AG) and two low- 
order wave-front sensor (LOWFS) detectors. 



The VISTA basic filled survey area is a tile made up of six 
pawprints, three in the Y-direction separated by 0.475 of a de- 
tector, and two in the X-direction separated by 0.95 of a de- 
tector. Except at two strips with just a single exposure, this tile 
has between twice and six times the exposure time of each paw- 
print at every pixel with a mode of two exposures, (see Fig. |3j. 
There are two possible ways to achieve a required uniform min- 
imum (two pawprints contributing) depth across a multi-tile sur- 
vey. The most efficient in terms of observing time is for succes- 
sive tiles to overlap the strips at top and bottom, coadding the 
data from the two separate tiles, and each will have an area of 
1 .636 sq.deg covered at least twice. However reaching the full 
depth by coadding these two strips can be complicated by vary- 
ing sky conditions if the adjacent tiles are not observed under the 
same conditions (e.g. different PSFs and sky conditions). Indeed 
the same two effects can be a problem in making a tile from 
six pawprints (depending on how rapidly the seeing varies be- 
tween pawprints). The other less efficient, but simpler, way to 
achieve constant depth across a multi-tile survey is to butt to- 
gether regions of tiles that have reached the minimum double 
exposure, ignoring the singly exposed strips resulting in an area 
of 1.501 sq.deg covered at least twice. 

In the case of WFCAM four pawprints are required to make 
a filled tile, and everything gets a single exposure and there are 
no edge strips, so large contiguous regions of the sky can be sur- 
veyed by simply overlapping subsequent pawprints as shown in 
Fig. |2] We note that in common with WFCAM, 'microstepping' 
is possible with VISTA, but it is not usually necessary since the 
pixel size on VISTA is smaller than WFCAM and the typical 
seeing at Cerro Paranal is slightly worse than Mauna Kea, so 
most images are critically sampled. Moreover, its use is not rec- 
ommended by ESO and the VISTA Public Surveys do not use 
the technique. 




Fig. 2. Areas of sky can be efficiently surveyed by arranging 4 
WFCAM pawprints (left) into the arrangement on the right. Each 
colour in the right hand image represents a different pawprint. 
There is a small amount of overlap at each edge. 




Fig. 3. An exposure map of a VISTA tile. The green strips 
at top & bottom have a single exposure. The majority of the 
area (blue) has two exposures, the pink has 3 exposures, the 
red 4 and the white 6. The doubly or more exposed area is 
1.501 sq.deg. The singly exposed green strips at top & bottom 
of the plot are each 1.475 degxO.092 deg=0.135 sq.deg and can 
be overlapped by corresponding areas from adjacent tiles for 
many surveys. Assigning one of the two 0.092 deg overlap (top 
& bottom) to each of the adjacent tiles involved in an overlap, 
means that each tile, when part of a filled larger area, covers 
(1.017+0.092)xl.475=1.636 sq.deg at least twice. 

2.1 .2. Merging VISTA pawprints into tiles 

The processing of raw pawprint frames into calibrated images is 
done by the VDFS pipeline at CASU on a nightly basi^l and in- 
cludes combining the pawprints into tiles. These pawprints and 
tiles are ingested into the VS A without any additional image pro- 
cessing. If any pointing is observed only once in a given filter 

8 http://casu.ast.cam.ac.uk/surveys-projects/vista/technical 
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in the survey design, the nightly pipeline-processed catalogue 
product is used to generate the merged Source (see § 13. U ta- 
ble. For deeper surveys, however, stacking tiles observed mul- 
tiple times typically involves observations over several differ- 
ent nights, and multi-night products are not the responsibility of 
CASU. WFAU creates tiles from multiple nights of data by first 
stacking the separate component pawprints and then combining 
the six stacked pawprints into a tile. Stacking the tiles rather than 
the component pawprints is very firmly not recommended. 

While tiling pawprints together to for m a tile is quite 
stand ard when working with visible images dFruchter & Hookl 
|2002|) . ground-based infrared sky subtraction is difficult because 
the sky is so much brighter than in the optical and dominates 
the flux of most objects. Furthermore, the VISTA camera's dis- 
tortion across the field-of-view and the larger variation in both 
the sky and the point-spread function (PSF) within the duration 
of the observations needed to crea te a single tile ma ke it nec- 
essary to do additional processing (Lewis et al. 2010). Tiles are 
processed using the following procedures^: 

- Stack all components of each pawprint to reach the in- 
tended signal-to-noise while removing bad pixels by taking 
the clipped median flux value. Each component is shifted by 
a different number of pixels in the X and Y directions as de- 
fined by the jitter pattern so that bad pixels or columns appear 
in different positions in each component; 

- Extract the catalogue from each pawprint. This detects col- 
lections (at least 4) of pixels which are all brighter than 
the background by more that 1.5 times the sky noise. 
Overlapping objects are deblended; 

- Recalculate the world coordinate system (WCS) of each 
pawprint, by comparing stars in the catalo gue to the Two 
Micro n All Sky Survey (2MASS) catalogue (Skrutskie et al. 
[2006b : 

- Calculate the photometric zero-points (VISTA system) of 
each detector in each pawprint and update the headers. 
For observing block processing this is done compared to 
2MASS, using the colour equations calculated by CASU and 
whichever stars of the correct colours are in common with 
2MASS and the VISTA image in question. For deep stacks 
and tiles, individual stars are compared to the component 
frames which have already been calibrated using 2MASS. 
The stars in the components and the deep stack are all on the 
same VISTA system, so a direct comparison is possible; 

- Filter each pawprint to smooth out large scale variations (> 
30" in the background), see dlrwinll2010l) for details; 

- Mosaic the 6 unfiltered pawprints into a tile, to produce a 
tile with large scale features present. Mosaicing adjusts all 
96 components (16 detectors in each pawprint) to the same 
level and drizzles them on to a single tangent-plane projec- 
tion image; 

- Mosaic all the filtered pawprints to produce a tile without 
large-scale background features; 

- Extract the catalogue from the filtered tile; 

- Recalculate the WCS of the tile; 

- Calculate the photometric zero-point (VISTA system) of the 
tile catalogue; 

- 'Grout' the tile catalogue to find the correct PSF in each re- 
gion of the tile and the correct offset for the Modified Julian 
Date. The grouting procedure tracks the variable flux within 
the first seven pre-defined circular apertures, with radii of 



http://casu.ast.cam.ac.uk/surveys-projects/vista/technical/tiles 
gives more details of the algorithms 



0.5", ^=", 1", V2", 2", 2 V2" and 4". Differential aperture 

corrections (i.e. the difference between the flux in the aper- 
ture and the total flux for a point source) are calculated for 
each detector in each pawprint that composes the tile. The 
fluxes in these 7 apertures are then recalculated, although 
larger aperture fluxes and other fluxes, such as the Petrosian 
flux are not corrected. These larger apertures will be only 
marginally affected by the seeing variations; 

- Reclassify the stars and galaxies in the tile catalogue. 
Classification originally occurs as part of the extraction pro- 
cess and uses the curve of growth to calculate a stellarness- 
of-profile statistic and classification (galaxy 1, star -1, noise 
0, saturated -9 , probable star -2 , or probable galaxy -3). 
The different PSFs in each pawprint when combined can give 
misleading classifications, so the classification code takes the 
grouting information about each pawprint and re-estimates 
the stellarness-of-profile statistic and classification; 

- Recalculate the photometric zero-point (VISTA system) of 
the tile catalogue; 

- Remove any temporary files, such as the filtered pawprints. 

Unsurprisingly the filtering also makes it impossible to 
produce accurate catalogue values for large extended sources, 
e.g. nearby galaxies or Galactic nebulae. The filtering scale used 
for the main VISTA Public Surveys removes structure on scales 
larger than 30". This is a similar scale to the local background 
scale length (22") which would in any case limit the accuracy of 
any photometry of larger objects. 

2.1.3. Active Optics 

The quality of VISTA images is maintained, as it observes at 
different elevations and its temperature changes, by updating the 
position of the secondary mirror, with corrections derived from 
look up tables and the low order wave front sensors. If a cor- 
rection has not been applied recently enough, or it is bad for 
some reason, the image quality can be degraded. Some images 
show evidence for such problems, contributing, along with see- 
ing variations between pawprints, and PSF variation across the 
field when in perfect alignment, to a greater variety of PSFs than 
in the case of WFCAM. 

2.1.4. Telescope mounts 

VISTA is an alt-azimuth mounted telescope, whereas UKIRT 
has an equatorial mount, so the focal plane in WFCAM remains 
in the same equatorial orientation, but the VISTA focal plane 
must rotate with respect to the telescope to keep the same orien- 
tation on the sky during an exposure. Since the focal plane can 
rotate, orientation is an additional degree of freedom. Different 
programmes can choose the orientation that best accommodates 
their survey design. Given the complex processing of image and 
catalogue data (see previous subsection), stacking two images at 
very different orientations was considered inadvisable. Thus we 
group data based on orientation as well as position/filter when 
stacking and tiling and to do this we have added an extra col- 
umn, posAngle to the RequiredStack(see S I3.1I ) table. This is 
the orientation of the image x-axis to the N-S line. This means 
that if images in the same programme lie in the same position 
and filter but have very different orientations they will be pro- 
cessed separately. As a default, we have been using a tolerance 
of 15 deg, but it can be set programme by programme. In practice 
this situation has only occurred once in the Public Surveys: the 
VVV team had a small amount of data in the Science Verification 
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stage which was orientated along equatorial RA axis, and later 
data in the same region of sky which was orientated along the 
Galactic longitude axis, and so the tiles are orientated at 60 deg 
to each other. 

3. Overview of the VSA 

3.1. The basic structure of the VSA 

WFAU receive V ISTA data as multi-extension FITS (MEF; 
iPence et al . 2010) files: primarily these come from the CASU 
pipeline, which processes the data by Observing Block (OB), 
but WFAU can also handle 'external' FITS images made outside 
VDFS, such as those produced by the survey consortia, as is 
currently done with deep VIDEO mosaics, with WFAU merely 
modifying the headers and file names prior to ingestion so that 
the ingestion code can correctly handle them. The MEFs contain 
either images or catalogues of objects extracted from them. Each 
MEF consists of a primary header, containing metadata relevant 
to the whole observation of the associated pawprint, and one 
extension for each detector of the pawprint, containing meta- 
data relevant to this detector as well as the binary data (image 
or catalogue) associated with this detector. In VSA parlance the 
content of an MEF image file is a multiframe (consisting of im- 
ages - frames - for all detectors) and a detection is an object ex- 
tracted from a single image in a single filter. The metadata head- 
ers from these files are loaded into tables within an RDBMS, as 
are the catalogue tabular data. The header data consists of multi- 
ple cards, each no longer than 80 characters, which contains the 
keywords with their values and descriptions. A standard key- 
word has a length of 8 characters, but longer, more descriptive 
ones, or even keywords consisting of multiple words can be used 
by preceding them with 'HIERARCH'. HIERARCH keywords 
originating at ESO start with 'HIERARCH ESO'. 

Depending on the survey, further data products may be gen- 
erated in a database-driven manner. For example, repeated ob- 
servations of the same field in the same filter may be stacked to 
create deeper images, from which catalogues are then extracted, 
while information from multiple different single-filter catalogues 
are merged to create catalogues of sources, which are objects de- 
scribed by attributes in several filters. The metadata from the ad- 
ditional image data products are ingested into database tables, as 
are derived catalogues, while the images themselves are stored 
in FITS format on disk. Further information may then be derived 
from the database tables and stored in new tables: e.g. variabil- 
ity information may be deri ved from mu l ti-ep och data, following 
the synoptic data model of ICross et al.1 (120091) . The VSA com- 
prises, therefore, a set of tables within an RDBMS, a collection 
of FITS files stored on disk and the interfaces that allow users to 
access these data. 

Whole image files may be selected for download from the 
archive through web forms or via SQL queries on the image 
metadata tables, while image cut-outs may be created from these 
using a different web form. Other web forms exist to provide a 
basic level of access to the catalogue data, but the real power 
of the VSA comes from the ability to query RDBMS tables us- 
ing SQL. To do this requires knowledge of the VSA database 
schema. The online VSA schema browseiFl provides detailed 
descriptions of every column in the hundreds of database tables 
in the VSA, but we summarise the five main table classes in the 
remainder of this section. 

Throughout this paper, we use a fixed-width font to re- 
fer to VSA database tables: e.g. Multiframe or vvvSource. 

10 http://surveys.roe.ac.uk/vsa/www/vsa_browser.html 



Many of the tables are set up for individual programmes 
(e.g. an individual Public Survey), such as vvvSource for 
the VVV, and videoSource for VIDEO. When we are dis- 
cussing generic properties of "Source" tables, we will abbreviate 
them as Source, rather than using programme-specific names. 
Individual columns within tables are referred to using a bold 
font: e.g. multiframelD or aperMag3. 

3.1.1. Metadata tables 

The following tables record metadata about images: 

- Multiframe: This contains the main primary header key- 
words from the VISTA images and some additional derived 
quantities that are calculated for each multi-extension image. 

- MultiframeDetector: This contains the main extension 
header keywords from the VISTA images except for astrom- 
etry related keywords and some additional derived quantities 
that are calculated for each detector. 

- CurrentAstrometry: This contains the astrometric related 
extension keywords and some additional derived astrometric 
quantities. 

- MultiframeEsoKeys: This contains subsidiary primary 
header keywords that are stored in the hierarchical ESO for- 
mat (HIERARCH ESO). 

- MultiframeDetectorEsoKeys: This contains subsidiary 
extension header keywords that are stored in the hierarchi- 
cal ESO format. 

- Astrometriclnfo: This includes additional derived astro- 
metric properties, for OB frames (i.e. those created in a sin- 
gle observing block) used in multi-epoch surveys: the half- 
spaces (see § I8.1.1I ) for each edge of each frame and small 
offsets that can be applied to the frames to improve the as- 
trometric precision. 

3.1 .2. Catalogue data tables 

The following are the catalogue data tables used in the VSA. 
There is a different table for each programme, so they will each 
start with the programme acronym: 

- Detection: This contains the extracted sources for each sci- 
ence stack detector frame in MultiframeDetector (indi- 
vidual frame in a multiframe): the raw extraction attributes 
from the original FITS table, the calibrated positions and 
magnitudes and a few other derived quantities. 

- Source: This is a merged filter catalogue from the deep- 
est images in each pointing, and is made "seamless" 
(lHambivet all 2008) to allow the user to find the most com- 
plete set of unique sources in the programme. 

- SynopticSource: This is a merged filter catalogue made 
from detections in contemporaneous images. This is use- 
ful if colours of variable stars are needed. Only those pro- 
grammes designed to have contemporaneous colours will 
have a SynopticSource table. 

- Variability: This contains statistics for the light-curves 
of sources in multi-epoch programmes, allowing selec- 
tion of variables based on different statistical quanti- 
ties. VarFrameSetlnfo is a useful supporting table for 
Variability and includes the fitted noise functions for 
each pointing. 
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3.1.3. Linking tables 

The following are tables that link different types of catalogue 
data or metadata: 



- MergeLog: For each pointing this lists the image frames in 
each filter from which the extracted detections were merged 
together to form the sources in the Source table. These are 
the deepest frames in each pointing. 

- SynopticMergeLog: For each pointing at each epoch, this 
lists the image frames in each filter from which the extracted 
detections were merged together to form the sources in the 
SynopticSource table. 

- BestMatch tables: These link the sources in the 
Source table to each epoch detection in multi-epoch 
surveys, to match epochs for light-curves. There are 
SourceXSynopticSourceBestMatch tables for "contem- 
poraneous" filter data and SourceXDetectionBestMatch 
tables otherwise. 

- Neighbour tables: These are simple tables containing all 
sources from the master table matching sources from the 
slave table within a specified radius. These can used for 
multiple purposes, such as to link with external surveys, 
e.g. vhsSourceXDR7Photo0bjAll links the VHS Source 
table to the Sloan Digital Sky Survey Data Release 7 
PhotoObjAll table. 

- TilePawPrints: This links tile image detections to the de- 
tections from pawprint images that make up the tile. 

- Provenance: This links image frames to their components, 
e.g. a deep stack to each epoch stack frame that went into it, 
or an epoch stack frame to the raw images. 

- ProgrammeFrame: This assigns image data to a programme 
and the programme requirements and is very important for 
programme curation. The same frame could be used in mul- 
tiple programmes, for instance different PI programmes with 
the same PI in different semesters or an all hemisphere re- 
lease containing data from VHS, VVV, VIKING and VMC. 

3.1.4. External catalogues 

The scientific goals of surveys tend to require external data 
(e.g. from surveys on other telescopes / instruments at differ- 
ent parts of the electromagnetic spectrum), in addition to data 
from VISTA itself. To support those analyses, the VSA con- 
tains copies of catalogues from a number of external surveys, 
which are listed in the online schema browser. The list of these 
is updated in response to requests from the survey consortia, and 
new cross-match neighbour tables are added for different pro- 
grammes, as these external surveys become available. The on- 
line documentation explains how these cross-neighbour tables 
can be used to perform effective cross-catalogue queries. 

3.1.5. Curation tables 

As mentioned above, the operations of the VSA are database- 
driven once the original MEFs have been ingested, with process- 
ing steps and data product provenance recorded automatically in 
the database. The VSA contains, therefore, a large number of ta- 
bles that drive, and are derived from, these curation tasks. Many 
of these are only of relevance to the VSA operations team, the 
following list do contain some pertinent information for users of 
the VSA: 



- Programme: Basic programme information. This includes 
the programme dependent information used to create the 
SQL schema which drives most curation tasks. 

- RequiredTile: The current expected tile product pointings 
and matching tolerance. In the case of VIDEO, for which 
we ingest mosaics provided by the survey team, the relevant 
table is RequiredMosaic. 

- RequiredNeighbours: lists which neighbour tables that 
join surveys have been created and what are the matching 
radii. 

- PreviousMFDZP: The photometric calibration history of 
each image extension. MFDZP is a contraction of 
MultiframeDetector zeropoint. 

4. Differences between WFCAM and VISTA Science 
Archives 

While the design of the WSA was developed with ultimate ap- 
plication to VISTA in mind, there are some differences between 
the WSA and VSA structures. 

4.1. Tile and pawprint information in the VSA 

In the VSA, we store catalogues from both the pawprints and 
tiles in the detection tables for each survey (e.g. vhsDetection 
for the VHS survey). The tile catalogues are needed to produce 
uniform catalogues to the full depth of each survey. However, the 
astrometric solution in tile catalogues is not quite as good as that 
in pawprint catalogues because the distortion is not as well repre- 
sented by the tangent plane (TAN) projection which tiles are pro- 
jected onto, as it is in the zenithal polynom ial projection ( ZPN) 
that c an be used for the pawprints, (see ICalabretta & Greisenl 
l2002l) . Saturated stars also have better photometry in the paw- 
print catalogues. Producing pawprint catalogues does not add 
any additional overhead, since they must be produced as part of 
the production of tiles to allow the pawprints to be aligned cor- 
rectly before mosaicking. 

Having both tiles and pawprints has created the need for mul- 
tiple layers of products and more complicated archive curation 
infrastructure (see § [7]) to keep track of these and allow them 
to be used together. It also means that stack requirements need 
an additional constraint, the offset position. Each stack that goes 
into a tile has a different offset position, 0-5, which is a function 
of the difference (in arcseconds) of the centre (optical axis) of 
the pawprint from the centre of the tile. These offsets are stored 
as offsetX, offsetY in the Multi frame table. The offset position 
is not the same as the offsetID in Multiframe, which is simply 
the order that the offset was observed in and may differ in relative 
position on the tile from one epoch to the next (i.e. the order in 
which the pawprints are executed can be chosen by the observer 
but the relative positions of the 6 pawprints are fixed in the OBs 
currently allowed). However, the offsetPos always refers to the 
same part of the tile. There can be considerable overlap between 
two pawprints, from different parts of two different tiles, but they 
will not be stacked together. 

4.1 .1 . The Tile-PawPrint matching tables 

To link the tile and pawprint catalogues together, we have cre- 
ated two extra tables: TileSet and TilePawPrints, which 
match each detection in a tile catalogue with detections at the 
same position in the pawprint catalogues. These tables are sur- 
vey specific, so VHS, which has detections in vhsDetection 
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TilePawPrints 

- links between tile and 
pawprint detections. 



Detection 

- raw + calibrated 
astrometric + photometric 
quantities 

- tiles and pawprints. 



MultiframeDetector 

- metadata for extensions 



TileSet 

- multiframelDs of tile and 
constituent pawprints. 



Mult iframe 

- metadata for all images. 
Both tiles and pawprints 



Fig. 4. The ERM for the tile-pawprint linking tables. TileSet 
links the 7 multiframes (1 tile and its 6 constituent pawprints) 
in Multiframe. Each row in TileSet links to many associated rows 
in the TilePawPrints table, each of which refers to a different 
object. Each of these objects has between 1 and 7 measurements 
(detections) in the Detection table. The complication comes from 
TileSet containing a mix of tile and pawprint frames which cover 
different areas on the sky. The tile has a single extension and the 
pawprints have 16 extensions in MultiframeDetector. 



will have tables vhsTileSet and vhsTilePawPrints to link 
the tile and pawprint catalogues. TileSet and TilePawPrints 
are designed alo ng the lines of SynopticMergeLog and 
SynopticSource dCross et al 120091) : TileSet links the frames 
together using the multiframe identifiers for the tile and paw- 
prints, and TilePawPrints links the detections using the ex- 
tension numbers (i.e. detector number) and sequence num- 
bers (i.e. order that object was extracted in the frame). The 
TilePawPrints table is deliberately as narrow as possible, and 
simply includes the necessary linking information, with no ad- 
ditional attributes such as magnitudes, since it is expected that 
it will always be used to link, and could be used with a whole 
variety of attributes. By not including magnitudes, we also re- 
duce the number of updates that are needed when recalibrating 
the photometry. We have put some examples of linking tile de- 
tections to pawprint detections in the VSA SQL cookboolfl 

In Fig |4] we show an entity-relationship model (ERM) for 
these new tables, showing how they are related to the current 
Multiframe, MultiframeDetector and Detection tables. 
The caption describes the relationships. 

Tile and pawprint detections are matched within a fixed ra- 
dius of 0.8" in VISTA, which is approximately the average 
seeing and is many times the typical astrometric error but less 
than the separation between easily-resolvable neighbouring ob- 
jects, so objects will only be matched to the same object on 
different images, not to neighbouring objects. The matching al- 
gorithm is the same as for sources in the Source table, see 
lHamblv etail d2008l) . Like MergeLog, TileSet includes all the 
associated frames as a frame set consisting of the tile frame (tlm- 
flD) and the 6 pawprint frames (olmfID - o6mfID) where ol 
is the pawprint with offsetID=l. TilePawPrints then contains 
the matched detections between the 7 frames, just like Source 
would have the matched detections between all the different fil- 
ters. These tables should be used as linking tables to compare the 
tile detections to pawprint detections or even pawprint to paw- 
print detections between offsets. 

There is an important difference in the way that tile 
sets are produced that causes some peculiarities to exist 
in the TilePawPrint tables that are not present in the 
SynopticSource tables. Tile sets are merged from frames of 
two different types - a tile and six pawprints - whereas synoptic 
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framesets are always merged from one frametype, either paw- 
prints in WFCAM or tiles in VISTA. In either case, each frame 
in the synoptic frame-set is similar to each other and the match- 
ing condition is a combination of multiframe identity and exten- 
sion number. Non-detections in a particular filter have a default 
entry in the SynopticSource table: if there is no detection in 
the J-band then jSeqNum is simply set to the standard integer 
default value (-99999999). 

Since tile sets in TileSet are made up of a tile and six paw- 
prints, with the tile composed of and overlapping all 16 detectors 
in each pawprint, tile sets cannot be matched by extension, but 
must be matched by multiframe alone. Therefore it is not possi- 
ble to have entries in the tile set identified by multiframelD and 
extNum, they must be identified by multiframelD only, and the 
links in TilePawPrints must include both the extension num- 
ber and sequence number. This makes the assignment of default 
rows more complicated. For example, if there is no detection in a 
particular frame, such as pawprint offset 1, we do not know off- 
hand - without doing additional processing - which extension 
the detection should have been on (if any, since it may be in a gap 
between the detectors for this offset). We should set the default 
row for this missing detection as olmfID = multiframelD of 
the pawprint, olExtNum = -9999, olSeqNum = -99999999. 
However, there is no equivalent row in the Detection table be- 
cause the foreign key constraint between the Detection table 
and MultiframeDetector forbids this, since there are not rows 
in MultiframeDetector with a non-default multiframelD, 
and a default extNum. Instead we set additional default rows in 
TilePawPrints, to have extension numbers equal to 2 and de- 
fault sequence numbers, so that they can match with default rows 
in the Detection table. 2 is used because it is the lowest number 
of a real science extension, and can apply to both tiles and paw- 
prints. These defaults are extremely useful in queries that com- 
pare the photometry of tile-detected sources in the merged-band 
catalogues to the pawprint detections. If a query compares the 
photometry of the tile to each of the 6 pawprints, in most cases 
one or more pawprints will not overlap with a particular tile de- 
tection; without these defaults no row would be returned even 
if all 5 other pawprints had a match and using these defaults, it 
is clear from the seqNum value that it is a non-detection. We 
must emphasize that an entry in TilePawPrints, which has a 
key (olExtNum=2, olSeqNum=-99999999) simply means that 
this object was not detected in pawprint offset 1. It does not de- 
note that that the object overlapped with extension 2 of pawprint 
offset 1 : more than likely it was from part of the tile which did 
not overlap with pawprint offset 1 at all, although it could just be 
too faint to be detected in the pawprint. 

Most TilePawPrints rows will be entries where the tile 
and two pawprints have matched detections, some where the tile 
and just one (in the outer strips) or three four, five or (infre- 
quently) six pawprints have detections, some where the tile only 
has detections (usually at the faint end). Defaults are added as 
above where no detection exists. Careful selection of what is de- 
fault and what is not will optimise the use of these tables. Any 
attributes in the detection tables can be compared in this way, al- 
though it is necessary to match a new instance of the Detection 
table for every frame in the table. Since this is done via the pri- 
mary key (multiframelD, extNum, seqNum), the joined SQL 
queries are very efficient. 

To match tile detections in a Source table or 
SynopticSource table to the detections on the constituent 
pawprints, it is necessary to remove the pawprint-only detections 
to leave a table with only the good tile detections and necessary 
defaults. If a query retains the pawprint-only detections, they are 
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Source 




- tile detections in 


all filters 



TilePawPrintsTDOnly 



Detection 

- raw + calibraied 
astromelric + photometric 
quantities 

- pawprints only. 



MergeLog 

- multiframelD / extNum of tile 
products in all filters 



TileSet 



Fig. 5. The ERM for the tile-pawprint linking tables matched 
with the Source table. Detections in the Source table come 
from tiles. In each MergeLog row there will be at least one 
non-default tile with a maximum equal to the number of fil- 
ters used in the programme. Each entry in MergeLog, includ- 
ing defaults is matched to a TileSet and each Source row 
to a similar number of rows in TilePawPrintsTDOnly, some 
of which may be default depending on whether there was 
a detection in each observed filter. The relationship between 
MergeLog and TilePawPrintsTDOnly is necessary because the 
multiframe extension information in MergeLog is found in 
TilePawPrintTDonly, not TileSet. 



interpreted as defaults, so if a detection is missing in a particular 
filter it will be matched to every set of pawprint detections 
that are not linked to a tile-detection, a nonsensical result. 
To avoid this, we have created a view TilePawTDOnly that 
can be directly matched in the same way as TilePawPrints. 
However, for very large datasets, such as the VVV, queries work 
better if users use vvvTilePawPrints and add the necessary 
constraints into the where clause, see example queries found in 
the SQL cookbook. The ERM for the matching of the Source 
table to the pawprint detections is shown in Fig [5] 

In the case of the VHS, which has a single epoch only, 
we have provided an additional table vhsSourceXPawPrints 
which is a neighbour table between vhs Source and the paw- 
print detections in vhsDetection. This simply matches all paw- 
print detections within a given radius to a source. A typical query 
is shown at the bottom of the tile-pawprints section of the SQL 
Cookbook. It is more difficult to do precision queries on par- 
ticular offsets or extensions as with TilePawPrints, but it is 
possible to do faster queries and may be preferable if only the 
pawprint data are required. 



5. Changes to image metadata 

5.1. ESO attributes 

VISTA data pass through an ESO quality control pipeline 
(whose modules are provided by VDFS) in Garching before 
being ingested into the VDFS data processing pipeline in 
Cambridge, while the VDFS -generated data products supplied 
to the ESO SAF must comply with ESO metadata standards. As 
a consequence, the VISTA data products present contain a quan- 
tity of standard ESO information not present in WFCAM data 
products. For example, the headers of image files contain a num- 
ber of ESO hierarchical FITS keywordo Those required for 
data processing, or judged to be scientifically useful, are prop- 
agated into the Multiframe or MultiframeDetector tables 
for keywords from primary or extension headers, respectively, 



while the remainder are recorded in MultiframeEsoKeys and 
MultiframeDetectorEsoKeys tables. 

Initial quality control occurs when the data is checked at the 
telescope and then at Garching to determine if the data was taken 
within the required constraints. This results in additional qual- 
ity control metadata created for VISTA, which are included in 
the VSA, and which are not found in the WSA. These include 
OBSTATUS ("Completed", "Executed", "Aborted", "Pending" 
and "Undefined"), and ESOGRADE ("A", fully within con- 
straints; "B", mostly - 90% - within constraints; "C"; "D"; "R", 
rejected). If the OBSTATUS is not completed, the whole OB will 
be repeated later. Each OB is also quality assessed more gener- 
ally when processed in Cambridge. 

There are also requirements for the files that are imported 
into the ESO Science Archive FacilitvFl. We generate the fol- 
lowing required keywords for the FITS files that are sent 
to the ESO archives: ABMAGLIM, ABMAGSAT, MJDEND, 
TEXPSUM, respectively representing the calculated 5cr mag- 
nitude limit of the point sources in the extension in AB mag, 
the AB magnitude at which sources start to saturate, the end 
time (modified Julian days) of the last exposure which went 
into the stack and the summed total exposure of all the paw- 
prints which went into the tile in seconds. These have been added 
into MultiframeDetector as abMagLim and abMagSat and 
Multiframe as mjdEnd, totalExpTimeSum. Currently the AB 
saturated magnitude is only calculated when images are released 
to ESO, so all the values are default in the archive. The image 
pixel data have also been scaled and converted to 32-bit inte- 
ger from 32-bit floating point, as a requirement for the ESO 
archive. The calculation of the scaling parameter is shown in 
AppendixlAl 

5.2. Deprecations 

Deprecation codes in the Multiframe, MultiframeDetector, 
Detection tables are used to control which frames are used 
where. The different sub-surveys within UKIDSS followed the 
same deprecation policy, so it was possible to apply that uni- 
formly across the whole WSA and not release any deprecated 
data. For VISTA, however, the Public Survey teams have de- 
fined different deprecation criteria, so it is not possible to define 
an analogous uniform policy for the VSA. Instead, it has been 
decided to release all data (deprecated or not) but to define addi- 
tional deprecation codes to indicate whether or not particular im- 
ages have been omitted in the creation of higher order products: 
deep stacks, tiles or mosaics, Source tables, SynopticSource 
tables, neighbour tables or multi-epoch/variability tables. The 
presence of most codes does imply exclusion from further use, 
but the following codes, which are exclusive to the VSA, are 
more nuanced: 

- 50: The frame is good enough for single epoch measure- 
ments, but should not be used in a deep stack; 

- 51: The frame has a problem with intermittency problem 
with channel 14 (some early frames had this temporary is- 
sue in detector 6). During deep stack creation, channel 14 is 
set to zero weight in a temporary confidence image. 

- 53: These are frames where the quality is marginal. Do not 
use the frames in deep stacks, or use the detections in the 
variability statistics, but do link them in the best match ta- 
ble, so the the survey team can do more tests to determine 
whether they are good enough to be used in future releases. 



http://heasarc.gsfc.nasa.gOv/fitsio/c/f_user/node28.html 



http://archive.eso.org/cms/ 
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The following codes, are standard exclusion codes, but are 
not found in the WSA: 

- 55: Aborted OB, if the science team decided they want to 
deprecate based on OBSTATUS; 

- 56: Deprecated because of poor ESOGRADE, if the science 
team decided they wanted to deprecate based on this; 

- 58: Deprecated because the catalogue could not be ingested. 
This happened for many very dense fields in early process- 
ing versions, but these have since been replaced. We have 
included the code in this paper for the sake of completeness 
and for users who use older team releases that still contain 
data annotated with this code. If a similar situation arises in 
the future, more data may be deprecated with this value. 

6. Changes to catalogue attributes 

Some of the catalogue attributes present in the VSA differ from 
those in the WSA, for several reasons. Differences in the VISTA 
and WFCAM detector properties mean that different effects need 
to be flagged, while the observing strategy changes described in 
§ 12.1.11 led to requirements for different information. We have 
also introduced additional attributes in response to user demand 
and in the light of enhancements we have made, especially in the 
treatment of multi-epoch data. 

We describe all these changes in the remainder of this sec- 
tion. 

6. 1 . VIRCAM detector properties 

The Raytheon VIRGO detectors used in VIRCAM have no de- 
tectable crosstalk and much lower persistence than the Rockwell 
Hawaii 2 detectors used in WFCAM. For this reason, detections 
do not have their quality bit-flags set for crosstal k (bit 19 of the 
post-p rocessing error bit flags - ppErrBits, see lHamblv et al.l 
2008) in the VSA, which reduces processing time. However, 
the VIRCAM detectors have a narrower dynamic range, with 
non-linearity and saturation occurring at lower flux levels. Non- 
linearity is calibrated out as part of the VDFS pipel ine at CASU , 
and WFAU have applied a saturation correction dlrwinl 12009). 
where necessary, to the photometry in the VSA — note that users 
wishing to extract photometry for objects brighter than m ~ 13 
(though this limit is survey dependent according to the DIT and 
filter used in the OBs) should use pawprint detections rather than 
tile detections since the corrections for saturation are more ac- 
curate forpawprints. 

On the top half of detector 16, the quantum efficiency (QE) 
varies on short timescales making flat fields inaccurate. This 
is particularly noticeable at short wavelengths (~ 1/j.m e.g. Z 
and Y). Since tiles are produced from 6 pawprints, each with 
16 detectors, we still create a tile even if one or more of the con- 
stituent detectors has problems. Each tile pixel comes from up to 
6 pawprints, which may have different PSFs (this is why tiles are 
'grouted' as described earlier). Tile detections that come partly 
from detector 16 in one or more pawprints are flagged using the 
post-processing error flag (ppErrBits) bit 12, so that users can 
select a data set without these detections or with them, whichever 
they prefer. Many of these detections have a low average confi- 
dence. We also added a flag for low confidence detections, bit 
7. 

Occasionally a parti cular pawprint detec tor is deprecated for 
one of several reasons (lHamblv et al.ll2008l) . e.g. poor sky sub- 
traction, bad channels, detector was not working correctly at the 
time of observation. The confidence of a deprecated detector is 



set to zero when making the tile, and this produces poorly de- 
fined extractor values (infinities and not-a-number), which are 
ingested as defaults into the database. These tile detections are 
flagged with bit 24. 

The two strips at the top and bottom of the tile have half 
the exposure time of most other parts of the tile. We flag these 
with bit 23. This is partly for the users and partly so that these 
detections do not become primary sources in the Source table of 
the survey if they overlap with a full exposure region of another 
tile. 

The list below is a summary of the new detection quality bit- 
flags developed for VISTA tile detections. 

- Bit 7, Low average confidence (< 80) in default aperture. 

- Bit 12, Source image comes partly from detector 16 

- Bit 23, Source lies within a strip of the tile that has half the 
exposure of most of the tile. 

- Bit 24, Source lies within an underexposed region due to a 
missing or deprecated detector 

These tile flags have not been applied to catalogues from ex- 
ternal mosaics created for VIDEO, since their processing is quite 
different. 

6.2. Changes to attributes in Detection tables 

The VDFS extractor dlrwin et al.l 120041) . which generates the 
raw catalogue parameters for all the VSA data (apart from 
VIDEO mosaic catalogues, which are extracted using Source 
Extractor; iBertin & Arnoutsl [l996) has had a few modifications 
so that there are slightly different output parameters for VISTA 
than WFCAM. In the original FITS catalogues produced by 
the VDFS extractor the Parentjorjohild column (deblend in the 
WSA Detection tables) has been replaced by averagej:onf 
(stored as the avConf in the VSA Detection tables), while the 
Hall radius, Hall flux and Hall flux error have been replaced by 
a half-light radius (halfRad) and flux and flux error (halfFlux, 
halfFluxErr) within an aperture twice the half-light radius. 

6.2.1. Modified Julian Day 

In WFCAM, we used the mjdObs in Multi frame as the time 
for each observation, which was used in light-curves. This at- 
tribute is inadequate in VISTA though, since tiles are made from 
overlapping pawprints which each have different mean obser- 
vation times and each tile detection may come from a different 
combination of pawprin ts. This is partic ularly important for sur- 
veys such as the VMC dCioni et alJ201 ll) . which require a signif- 
icant fraction of an hour of integration time to reach the required 
depth at each epoch, but are looking for variables with periods of 
a few hours. In this case an accurate measurement of the mean 
observation time is fundamental to the science. 

We have now added a new attribute mjd into the detection 
tables. This is the standard Modified Julian Date (MJD, in dou- 
ble precision days since midnight on Nov 17th 1858) and is 
calculated detection by detection in the case of tiles or exten- 
sion by extension for pawprints. During 'grouting', the average 
MJD of each tile detection is calculated as the weighted aver- 
age (weighted by the average confidence in a 1" radius aperture: 
aperture 3) MJD of the different pawprints that contributed to 
the tile detection. In the FITS file, this is expressed as a 4-byte 
floating point value in minutes from the beginning of the day 
of the observation, as column MJDOff, with the beginning of 
the day given in the header as MJDJ)AY. We have also calcu- 
lated the mean MJD for each pawprint detector and added this to 
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MultiframeDetector as mjdMean. This is the value that be- 
comes the mjd (8-byte double precision) value in the detection 
tables for pawprints, not the mjdObs, which is the start time of 
the observation. 



6.2.2. Half-light radii 

As well as ingesting detection attributes calculated by the VDFS 
extractor produced in the FITS catalogue output, other attributes 
are calculated by the archive curation software. These include 
several half-light radius measurements, based on the aperture 
fl uxes and the Petrosian flux, using the same method discussed 
in ISmithetalJ (l2009). These new attributes are: 

- hlCircRadAs, the half-light circular radius in arcseconds. 

- MCircRadErrAs, the error in the half-light circular radius 
in arcseconds. 

- hlGeoRadAs, the geometric mean between the half-light ra- 
dius along the semi-major axis and the half-light radius along 
the semi-minor axis in arcseconds. 

- hlSMnRadAs, the half-light radius along the semi-minor 
axis in arcseconds. 

- hlSMjRadAs, the half-light radius along the semi-major 
axis in arcseconds. 

- hlCorSMnRadAs, the half-light radius along the semi- 
minor axis corrected for seeing, in arcseconds. 

- hlCorSMjRadAs, the half-light radius along the semi-major 
axis corrected for seeing, in arcseconds. 

The algorithms used to calculate the above attributes are 
given in AppendixlBl 

6.2.3. Magnitude corrections 

Photometric calibrations, derived by the VDFS pipeline at 
CASU (Hodgkin et al. 2012, in preparatiorQX are applied in 
the archive curation software. As mentioned in § 16.11 we now 
include a saturation correction to the pipeline produced mag- 
nitudes of stars flagged as potentially saturated. We decided to 
include explicit columns that contain this and other source de- 
pendent corrections (those that are not simply field dependent), 
making it easy for users to understand and apply the corrections 
themselves. 

The current corrections that are applied to the magnitudes by 
WFAU in the VSA are: 

- illumCorr, the illumination or scattered light correction that 
is calculated and provided by CASU for fields on a month by 
month basis. 

- distortCorr, the radial distortion correction, which depends 
on the distance from the optical axis and the filter only. 

- saturatCorr, the saturation correction, that is added to the 
1 arcsecond radius aperture magnitude (aperMag3) of bright 
stars only (those that are flagged as potentially saturated). 

- deltaMag, the sum of the exposure time correction 
(2.5 log 10 expTime), the atmospheric extinction correction 
((0.5(amStart + amEnd) - l)extinctionCat), the illumina- 
tion correction and the radial distortion correction. The sat- 
uration correction is not included, because it only applies to 
aperMag3. A user can calculate their own magnitudes on the 
VISTA photometric system for objects by measuring a flux 
in any way they like and applying the zero-point and adding 
deltaMag. 



14 http:// http://casu.ast.cam.ac.uk/surveys-projects/vista/technical/photo 



The aperture corrections are not included since they are only 
applied to specific magnitudes and are the same for all ob- 
jects on one detector. The values for these are included in the 
MultiframeDetector table. 

WFAU had several requests for aperture magnitudes without 
the point-source aperture correction (i.e. for extended sources). 
Therefore we have included these values for the 7 apertures (1- 
7) for which aperture corrections have been applied as standard. 
These are named aperMagNoAperCorrl, aperMagNoAper- 
Corr2 to aperMagNoAperCorr7. These magnitudes can be 
used for extended sources if required. 

6.3. Changes to attributes in the Source tables 

Several of the attributes in the Detection table have been 
propagated through to the Source table or SynopticSource 
table. Of the new Detection attributes, we have propagated 
hlCorSMjRadAs and the non-aperture corrected aperture mag- 
nitudes into the Source table. We do not propagate mjd though, 
since the sources in the Source table come from the deepest 
data, stacked across multiple epochs where the time of obser- 
vation is not particularly useful. Since the SynopticSource 
matches data with a specific epoch, and is particularly useful 
for variability work on point sources, mjd is propagated but 
hlCorSMjRadAs and aperMagNoAperCorr[l-7] are not. 

We still only produce one Source table, from the high- 
est (most processed image, i.e. tile if the survey contains both 
tiles and stacks, mosaics if the survey contains these) product 
layer (see § |7). Producing one for tiles and one fo r pawprints 
break s the idea of a single master source list (see ICross et ail 
2009). Instead, we have a master source list produced from tiles 
which is linked to pawprints using the TilePawPrints table, 
see § |4TT1 

7. Infrastructure 

There have been several changes to the science archive infras- 
tructure that improve curation of the surveys, but can also be use- 
ful for scientists who want to make the best use the VSA. Some 
of these c hanges have been in c remental an d have been docu- 
mented in ICollins et all d2009l) : ICross et al.1 d2009l 1201 lb . The 
main changes to the VISTA Public Surveys from the UKIDSS 
Public Surveys are listed below: 

- Automatically set up all the requirements, the database 
schema and curation tables contents using available data and 
basic programme properties from the Programme table. This 
was also done for the UKIDSS-DXS and WFCAM PI pro- 
grammes. 

- Manage multiple layers of products: pawprints, tiles and mo- 
saics, including external products automatically. 

- Have a more sophisticated setup for multi-epoch products, 
specified by the synopticSetup string in Programme. 

We set up all the requirements, the database schema and cu- 
ration tables contents for a survey, when we start preparing a 
static release (e.g. VHS-DR1), using a combination of the pro- 
gramme requirements in the Programme table and the available 
data. We have made the infrastructure and processing the same 
for all surveys, unlike in UKIDSS where the wide shallow sur- 
veys (GPS, GCS and LAS) were processed differently from the 
deep surveys (DXS and UDS). This makes it easier for the oper- 
ators who run tasks, and makes it much simpler if programmes 
rnBfolvprdjpellteiiiture. For instance, if the VHS decided to add in 
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a second epoch in any filter, this would be automatically accom- 
modated. 

7. 1 . Stack, tiles and mosaics 

Requirements for stack, tile and mosaic products are set up 
by grouping the data into different pointings by position, po- 
sition angle and, in the case of pawprints, offset. The require- 
ments for a particular release are stored in RequiredStack, 
RequiredTile, and RequiredMosaic. The stacking software 
uses the definitions to create the deepest stack possible for each 
product in RequiredStack from the pawprints. The tiling soft- 
ware creates tiles at each location in RequiredTile using these 
stacks, so the tile requirements must be linked to the pawprint 
requirements. 

The different layers of products (pawprints, tiles and mo- 
saics) can be linked to each other using the ProductLinks ta- 
ble: e.g. tile productID 1 in RequiredTile in the VHS may 
be composed of pawprints with productlDs 1, 3, 5, 7, 9 & 
11 in RequiredStack. ProductLinks links the requirements 
whereas Provenance links the image metadata from the actual 
files. From one release to another the values in ProductLinks, 
RequiredTile and RequiredStack may stay the same (al- 
though this is not guaranteed), but a product which initially con- 
tained 2 epochs worth of image data may be replaced by one 
containing 5 epochs worth of image data and will therefore link 
to different multiframes in Multiframe and Provenance. 

External (made outside VDFS) products, e.g. VIDEO (or 
UKIDSS-UDS in the WSA) mosaics, which are created by 
the survey team and imported into the VSA, are set up via 
the ExternalProducts table, which contains the programme, 
product type, release number and information about who created 
the mosaic. 

The required products and the actual image frames are now 
linked to each other via the ProgrammeFrame table which in- 
cludes programmelD, productID and releaseNum and links to 
the image metadata tables via multiframelD. The release num- 
ber for products is a running number from when WFAU first 
started producing releases for the science teams, so the products 
in the first public releases of the VISTA Public Surveys have 
a variety of release numbers depending on the programme. In 
VISTA, the programme translation for each incoming FITS im- 
age is more complicated than for WFCAM, and the programme 
matching algorithm uses a combination of the HIERARCH 
ESO OBS PROG ID, HIERARCH ESO OBS NAME and 
HIERARCH ESO DPR CATG header keywords. 

ProgrammeFrame is essential for keeping track of what im- 
ages are related to each requirement. This makes it much easier 
for scientists and VSA support staff to keep track of what has 
been created and whether anything is miss ing. This in frastruc- 
ture is crucial for the automated curation (Collins et al. 2009|) of 
VSA products, where decisions are made about what tasks need 
to be run based on the requirements and what has already been 
completed. 

When OB frames are recalibrated in multi-epoch pro- 
grammes, the OB tiles are compared to the deep tiles and the 
zero-points adjusted accordingly. A change to the zero-point of 
the tile is propagated to the constituent pawprints. The code 
to propagate the zero-point differences was not developed un- 
til very recently, so most datasets in the first release will not 
include this propagation; at the time of writing only the VVV 
dataset will include this. The pawprint zero-points for the other 
multi-epoch public surveys (VIKING, VMC and VIDEO) wiU 



be correctly updated for all recalibrations in the data releases 
that contain data from ESO semester P87 and beyond. 

7.2. Multi-epoch tables 

We have introduced a new string attribute into the 
Programme table, called synopticSetup, to control 
the production of more than one Best Match ta- 
ble, i.e. both a SourceXDetectionBestMatch and a 
SourceXSynopticSourceBestMatch. In surveys such as 
the VVV, many scientists would like colour information 
for variable stars, so the colours must come from near- 
contemporaneous observations. This information is in the 
SynopticSource table whereas the colours in the Source 
table come from the deepest images which are stacked from 
several epochs of data and are certainly not contemporaneous. 
However, this survey will take many tens of epochs, mostly 
in one filter, K s , so a SynopticSource table that covers the 
full time range would be inefficient - the Z, Y, J and H band 
attribute columns would contain mainly default values. A more 
efficient way is to specify the SynopticSource over a short 
time range and specify that the statistics in the Variability 
should come from data in SourceXDetectionBestMatch, 
which covers the whole time range. It is still necessary to 
link the SynopticSource table with all the other tables, 
so a SourceXSynopticSourceBestMatch table is required 
too. The synopticSetup attribute is a string, with the following 
value in the VVV: BOTH : VAR-UNC : COR , SV , P87, which can be 
parsed to give the following information: create both Best Match 
tables; use the uncorrelated (SourceXDetectionBestMatch) 
when calculating the variability statistics, and only use data 
between the beginning of the Science Verification period (SV) 
and the end of ESO Period 87 (P87) in the correlated table 
(SourceXSynopticSourceBestMatch). 

8. Other recent improvements to the VSA and WSA 

In addition to the above changes necessary for processing 
VISTA data, we have made various changes to improve overall 
curation of WFAU produc t s. The se e xtend the database design 
described in lHamblv et"ai] d2008l) and lCross et all d2009l) . 

8.1. Improvements to multi-epoch data model and 
calculations 

8.1 .1 . Creating the BestMatch tables 

In ICross et all d2009l) § 9.2, we discussed possible improve- 
ments to checking missing observations to correctly create the 
BestMatch tables. One method that we discussed possibly im - 
plementing was the half-space method (Buda vari et al.l 12010). 
which we have now implemented. We define 16 half-spaces for 
each single epoch image. These 16 half-spaces come in 4 sets, 
one 2 pixels outside each image edge, one 2 pixels inside each 
image edge, one 2 pixels outside the edge of the jittered region 
where the exposure time per pixel goes from the total exposure 
time to some fraction of it and one 2 pixels inside this edge. With 
4 edges, this gives 16 half-spaces. We found that defining a half- 
space using 3 points: the two ends of an edge and the midpoint, 
the edge could usually be described to an accuracy of around one 
pixel, so a 2 pixel margin each side would encompass all points 
which we were unsure about. Each half-space is described using 
4 numbers, a 3-D Cartesian vector, normal to the plane of the 
half-space and a constant that gives the offset of the plane from 
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the centre of the sphere. All the half-space information is stored 
in a new table, Astrometriclnfo (one for each multi-epoch 
programme, e.g. vmcAstrometricInfo), to help with the cura- 
tion of multi-epoch data. As well as the half-space information, 
we have included place-holder columns for attributes to describe 
small adjustments to the astrometric solution of each image that 
will improve fitting to the proper motio n, when we start calculat- 
ing proper motions in VISTA data (see ICollins & Hamblvl2012l 
for the description of the method used for wide area UKIDSS 
surveys). 

The half-spaces are used to check frames which do not have 
an expected matc h to a primary source in the Source table, (see 
ICross et alJl2009l § 9.2). If there is no detection, this may be 
for one of several reasons: the frame does not overlap with the 
part of the sky containing the source; the source is within the 
jitter regions, where the integration time is less and there may be 
a gradient in the integration time across the object; the source is 
too faint to be detected on a single exposure; the source is usually 
bright enough, but has faded below the detection threshold; the 
source is blended with another; the object has moved sufficiently 
far from the expected position. 

Using the half-spaces allows us to flag the first two possibil- 
ities. Checking whether a detection should be within the image 
or within the jitter section is trivial and each calculation is ex- 
tremely quick. Most importantly, since the half-space describes 
an edge accurately within a pixel or two, very few objects need 
the more careful test that use the WCS to calculate the exact po- 
sition of the object on the frame. Using the half-spaces we are 
able to reduce the number of slow tests to only those objects 
within two thin strips, each 4 pixels wide, at the image edge and 
the edge of the jitter region. The half-space information is stored 
so that archive users can use it too, to rapidly determine whether 
an object is within the frame. 

8.1.2. Expected noise model 

We have made changes to the calculation of the expected noise. 
The expected noise is still based on fitting a function, in this case, 
a Strateva fu nction to the RMS versus mean magnitude data (see 
ICross et aT1l2009l) . In early team data releases (before version 
1.1 data was released), the expected noise was simply the value 
of this function at the mean magnitude of the source. However, 
we found that the actual magnitude limit was often quite a bit 
brighter than the expected magnitude limit, especially when the 
field is very dense, see Fig[6] When this happens the RMS versus 
magnitude plot turns over and simply using the fit will underes- 
timate the RMS; indeed sometimes the expected RMS will be 
negative. To mitigate against this, we have made the following 
changes: 

- Calculate the turnover point: the maximum RMS as a func- 
tion of magnitude if the function does turn over. 

- For all magnitudes fainter than the turnover point or the max- 
imum range of magnitudes in the fit (whichever is brightest), 
set the expected RMS to the value at this point, see Fig [6] 

- For all magnitudes brighter than the minimum range of mag- 
nitudes in the fit, set the expected RMS to the value at this 
point. 

The astrometric fit has also been updated. Instead of calcu- 
lating a simple clipped mean, we calculate a weighted mean po- 
sition. We calculate the expected astrometric noise in each filter, 
in the same way as we calculate the expected photometric noise, 
using the angular separation between each epoch position and 
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Fig. 6. Magnitude-RMS plot of a dense pointing in the VMC, 
where the fitting function turns over before the expected mag- 
nitude limit (dotted vertical line). The old expected RMS as a 
function of magnitude is plotted as the dashed line, and the new 
expected RMS as a function of magnitude is plotted as the solid 
line. Outside of the fitting range, or beyond the turnover point, 
the RMS is a constant. Non-variable stars are shown as dots, and 
variables are shown as open squares. At the bright end, Y < 12.5 
mag, the RMS increases due to saturation effects, but this has 
not been included in the model yet. 



the median absolute deviation clipped median position for sta- 
tionary stars and fitting a function to these median values as a 
function of magnitude. This function describes the locus of the 
astrometric uncertainty for non-moving point sources, much as 
the equivalent fit for the magnitude RMS described the photo- 
metric uncertainty as a function of magnitude for non-variable 
point sources. This calculated uncertainty as a function of mag- 
nitude will be used to weight the position. 

A particular pointing may only be observed once in one filter, 
and in some programmes a filter is only observed once in each 
pointing, so it is often not possible to fit a noise model for each 
pointing and filter. For photometric variability statistics, this is 
not a problem, since all the values are default if there is only one 
epoch, but when it comes to the astrometric fit, we would like to 
use all the data in all filters together, to improve the fit. 

To estimate the errors on these frames, we calculate a default 
noise model in each filter which has a calculable noise in at least 
one pointing. We take all the calculated noise models, and calcu- 
late the mean RMS of these models in each of a set of nine bins 
across the magnitude range, and fit the noise model to the mean 
RMSs. This model is used in any pointing in this filter where 
there is only one epoch. 

For filters where there is only ever one epoch, we cannot 
directly measure the noise as a function of magnitude, so we 
make the assumption that the behaviour in any filter is similar 
to the others (which is born out by experience of surveys where 
multiple observations are taken in all filters). We expect that the 
limit reached at the bright end for the same DIT in the OBs will 
be the same and that the increase in noise toward the faint end 
depends on the depths of the exposure, which depends on the 
total exposure time or expected magnitude limit. Moreover, any 
other differences, such as the effects of sky brightness or residual 
non-linearity are likely to be a function of wavelength, so we 
choose the nearest filter in wavelength that has enough epochs 
for a fit to be made to the RMS as a function of magnitude. We 
take the default model in this filter and adjust for the difference 
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in expected magnitude limit, e.g. using the Strateva model, we 
will calculate new values for b and c as follows 



< f(m) > = a + b 10 a4m + c 10 USm 

Am = m\ — m[ 

a — a 

b = v io-°- 4Am 

c = c ' lO-°- 8Am , 

where m\ is the magnitude limit of frames in this filter and m[ is 
the magnitude limit of frames in the comparison filter. 

The weighted mean position uses a 3cr clipped weighted 
mean in each of the three Cartesian coordinates, and then con- 
verts back to equatorial coordinates. The type of fit used is 
recorded in the VarFrameSetlnfo table as motionModel. The 
model described above is a static weighted model: 'wgtstatic'. 
When we have VISTA data over several years we expect to fit 
for proper motion too. 

8.2. Improvements to the interface 

The VSA proprietary and public release databases can be ac- 
cessed and queried via the web-browser based interface. The 
various access methods allow users to perform SQL queries on 
the science ready tables; extract image cut-outs and download 
entire image and catalogue files. In addition public releases will 
be accessible under the Virtual Observatory (VO). Releases will 
be discoverable in the VO registries. A Table Access ProtocoQ 
(TAP) interface to each data release will allow users to per- 
form SQL queries using the Astronomical Data Query Language 
(ADQL). We already have partially compliant TAP services 
available for all our main data holdings. They can be accessed 
through software like TOPCAT (lTavlorll2005h and VOExplorer, 
as well as through standard HTTP GET and POST commands. 
The services and their endpoints are registered on the major VO 
registries. We also have conesearch and Simple Image Access 
Protocol (SIAI0) services available on the VO for the same list 
of datasets. 

Fully compliant TAP services will likely be available by the 
end of 2012. 

9. Illustrative science examples 

9.1. Colours of VHS point sources 

The optical-infrared colour-colour plot is a powerful classifier of 
different types of stars, with most stars lying along a narrow lo- 
cus. However, extinction and poor photometry can widen this lo- 
cus and prevent the separation of brown dwarfs, QSOs and com- 
pact galaxies. The following selection will select point sources 
in the VHS, which are matched to point sources in the SDSS 
and are not flagged for poor quality. We select the colours and 
positions, but only for stars in areas of low Galactic extinction. 

SELECT s.sourcelD, s.ra, s.dec, 

/* select position colour and magnitude 

information */ 

(sdPho.psfMag_g-s. jAperMag3) AS gmjPnt, 
jmksPnt, ksAperMag3 

15 http://www.ivoa.net/Documents/TAP/ 

16 http://www.ivoa.net/Documents/SIA/ 
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Fig. 7. The main locus of stars matched between the VHS and 
SDSS selected in (g-J) versus (J-Ks). 



/* from vhsSource, SDSS DR7 PhotoObjAll, 

neighbour table */ 

FROM vhsSource as s, 

vhsSourceXDR7Photo0bjAll as x, 

BESTDR7. .PhotoObjAll as sdPho 

/* join the tables */ 

WHERE s.sourcelD = x.masterObjID AND 

x . slaveOb j ID=sdPho . ob j ID 

/* find matches within 2 arcsec */ 

AND x.distanceMins<=<5.0333 

/* that are nearest matches */ 

AND x.distanceMins IN ( 

/* sub query to find minimum distance 

for a match to this sourcelD */ 

SELECT MIN(distanceMins) 

FROM vhsSourceXDR7Photo0bjAll 

WHERE masterOb j ID=x . masterOb j ID) 

/* select SDSS primary objects and stars */ 

AND x.sdssPrimary=l and x.sdssType=6 

/* objects with no flags in VHS */ 

AND jppErrBits=0 AND ksppErrBits=<9 

/* stars or probable stars in VHS */ 

AND mergedClass IN (-1,-2) 

/* Not default magnitudes in SDSS or VHS */ 

AND sdPho.psfMag_g>8. AND s . jAperMag3>8 . 
AND s.ksAperMag3>®. 

These data can be plotted using TOPCAT dTavlorl2005l) . and 
the main locus can be found, as seen in Fig. [7] The gradient of 
the locus in the colour-colour plot can be measured and then rare 
objects can be further selected. 

9.2. Flare stars 

The following query selects objects that could be flaring stars or 
some type of cataclysmic variable. To do this, we select sources 
that have a minimum K, magnitude that is at least 2 magnitudes 
brighter than the median magnitude and further, it has at least 
2 measurements that are brighter than the median by 0.5 mag- 
nitudes. This second constraint should remove sources with one 
point that has escaped flagging. We also want at least 5 good K s 
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detections for a reasonable light curve. The following query was 
performed on the VVV survey. 

SELECT v.sourcelD, s.ra, s.dec, 

/* select some useful attributes, pointing info, 
number of observations, min, medium, maximum, 
variable class, and star/galaxy class */ 
v.framesetID, ksnGoodObs, ksMinMag, ksMedianMag, 
ksMaxMag, variableClass, mergedClass, 
(ksMedianMag-ksMinMag) as ksFlareMag, 
COUNT (*) AS nBrightDetections 
/* from vvvVariability and vvvSource */ 
FROM vvvVariability as v, vvvSource as s, 
vvvSourceXDetectionBestMatch as b, 
vvvDetection as d 
/* first join the tables */ 
WHERE s . sourceID=v. sourcelD AND b.sourceID= 
v.sourcelD AND b .multiframeID=d.multiframeID 
AND b . extNum=d . extNum AND b.seqNum= 
d. seqNum AND 

/* select the magnitude range, brighter than 

Ks=17 and not default. */ 

ksmedianMag<18 . and ksmedianMagXJ. AND 

/* at least 5 observations */ 

ksnGood0bs>=5 AND ksbestAper=5 AND 

/* Min mag is at least 2 magnitudes brighter 

than median mag (but minMag is not default) */ 

(ksmedianMag-ksminMag)>2 . and ksMinMag>0. AND 

/* Only good Ks band detections in same 

aperture as statistics are calculated in*/ 

d.seqNum>8 AND d.ppErrBits IN ((9,16) AND 

d.filterID=5 AND d. aperMag5>0 AND 

d. aperMag5<(ksMedianMag-Q. 5) 

/* Group detections */ 

GROUP BY v.sourcelD, s.ra, s.dec, 

v.framesetID, ksnGoodObs, ksMinMag, ksMedianMag, 

ksMaxMag, variableClass, mergedClass 

HAVING C0UNT(*)>2 

/* Order by largest change in magnitude first.*/ 
ORDER BY ksMedianMag -ksMinMag DESC 

We plot the Ks-band light curve of one of these objects in 
Fig [8] The majority of the detections are 14'' 1 magnitude, but 
there is a flare of almost 2.5 mag followed by fading of 1 mag 
before the star returns to K s = 13.9 mag. 

9.3. Global properties of VIKING-SDSS galaxies 

The following selection uses IR photometry and sizes from 
VIKING combined with optical colours and redshifts - both 
spectroscopic and photometric - from SDSS. In this query we 
use neighbour tables to join VIKING and SDSS. We use the 
SQL command "UNION" to combine the query which matches 
to galaxies with spectra to the query for those with only photo- 
metric redshifts. Users who are worried about completeness can 
use the "UNION" command to combine further queries, such as 
those that select for VIKING galaxies without SDSS matches or 
ones for SDSS matches but neither spectroscopic or photometric 
redshifts. The two combined queries must have the same number 
of columns, with the same names. In the cases where one query 
has columns which the other has no entry for (e.g. redshift, z) 
we can fill this column with a default number, just as we demon- 
strate below with the redshift status and spectroscopic type. 
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Fig. 8. The light curve of a flare star in the VVV selected using 
the query in § I9.2I 



SELECT 

/* select information necessary to create 
bi-variate brightness distribution, 
extinction corrected Petrosian magnitudes 
put into AB system and seeing corrected, 
semi -major axis size, PLUS SDSS colours 
and redshifts (spectroscopic and 
photometric) 
*/ 

s . sourcelD , s . ra , s . dec , s . frameSetID , 

(s.hPetroMag-s.aH+fh.VegaToAB) AS hPetroAB, 

s.hHICorSMjRadAs, (s.hPetroMag+ 

2 . 5*logl<5(2 . *3 . 14159*s .hHlCorSMjRadAs* 

s . hHlCorSMjRadAs) - s . aH+fh . VegaToAB) AS 

hSurf Bright , (s . ksPetroMag-s . aKs+ 

fks. VegaToAB) as ksPetroAB, 

s . ksHlCorSMjRadAs , (s . ksPetroMag+ 

2 . 5*loglQ(2 . *3 . 14159*s. ksHlCorSMjRadAs* 

s . ksHlCorSMjRadAs) -s . aKs+fks . VegaToAB) 

AS ksSurfBright, dr7spec.objID as sdssID, 

( (dr7spec . modelMag_u-dr7spec . extinctions) - 

(dr7spec . modelMag_g-dr7spec . extinction_g) ) 
as umgModel , z , zErr , zConf , zStatus , specClass 
/* from vikingSource, Filter (one for each 
filter for VegaToAB) , SDSS-DR7 neighbour 
table, SDSS SpecPhoto table */ 
FROM vikingSource AS s, Filter AS fh, Filter 
AS fks, vikingSourceXDR7Photo0bjAll AS xdr7, 
BESTDR7. . SpecPhotoAll as dr7spec 
/* First join tables, */ 
WHERE xdr 7. masterObjID=s . sourcelD AND 
fh. filter ID=4 AND fks . filterID=5 AND 
dr7spec . ob j ID=xdr 7 . slaveOb j ID AND 

/* select VIKING primary sources matched to 

SDSS primary sources */ 

(pri0rSec=<5 OR priOrSec=frameSetID) AND 
sdssPrimary=l AND 
dr7spec . sciencePrimary=l AND 
/* within 2" of nearest match */ 

xdr7.distanceMins<0. 03333 AND 
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xdr7.distanceMins IN ( 
SELECT MIN(distanceMins) 
FROM vikingSourceXDR7Photo0bjAll 
WHERE master0bjID=xdr7.master0bjID AND 
sdssPrimary=l) AND 
/* for objects classified as galaxies or 
probable galaxies in VIKING */ 
mergedClass IN (1,-3) AND 
/* h and ks size is 0.7<sma<=10. arcsec */ 
ksHlCorSMjRadAs>0.7 AND hHlCorSMjRadAs>0 . 7 
AND ksHlCorSMjRadAs<=10.0 AND 
hHlCorSMjRadAs<=10.0 AND 
/* good quality data in VIKING h and ks */ 
hppErrBits=0 AND ksppErrBits=0 AND 
/* ks extinction corrected AB mag < 20.5 */ 
(ksPetroMag-aKs+fks. VegaToAB) <20. 5 AND 
/* ra and dec range to restrict to where 
SDSS is */ 

s.ra>100. AND s.ra<250. AND s.dec>-5. AND 
/* z>=0.002 */ 
dr7spec . z>=0 . 802 

/* Add in ones which do not have SDSS spectra 
using UNION */ 
UNION 
SELECT 

/* select information necessary to create 
bi-variate brightness distribution, 
extinction corrected 

Petrosian magnitudes put into AB system and 
seeing corrected, semi-major axis size AND SDSS 
matches to PhotoObj table and photoz table */ 
s . sourcelD , s . ra , s . dec , s . f rameSetID , 
(s.hPetroMag-s.aH+fh. VegaToAB) AS hPetroAB, 
s. hHlCorSMjRadAs, (s .hPetroMag+ 
2 . 5*logl0(2 . *3 . 14159*s .hHlCorSMjRadAs* 
s . hHlCorSMjRadAs) -s . aH+fh . VegaToAB) AS 
hSurfBright , (s . ksPetroMag-s . aKs+fks . VegaToAB) 
as ksPetroAB,s.ksHlCorSMjRadAs, (s.ksPetroMag+ 
2 . 5*logl0(2 . *3 . 14159*s .ksHlCorSMjRadAs* 
s . ksHlCorSM jRadAs) - s . aKs+fks . VegaToAB) AS 
ksSurfBright, dr7phot.objID as sdssID, 
( (dr7phot .modelMag_u-dr7phot . extinctions) - 
(dr7phot .modelMag_g-dr7phot . extinction_g) ) 
as umgModel, photz.z as z, photz. zErr as zErr, 
-9.9999 as zConf,-9 as zStatus,-9 as specClass 
/* from vikingSource , Filter (one for each 
filter for VegaToAB) , SDSS-DR7 neighbour 
table, */ 

FROM vikingSource AS s, Filter AS fh, Filter AS 
fks,vikingSourceXDR7PhotoObjAll AS xdr7, 
BESTDR7. .PhotoObj All as dr7phot, 
BESTDR7. .photoz as photz 
/* First join tables, */ 
WHERE xdr7.master0bjID=s.sourceID AND 
fh.filterID=4 AND fks . filterID=5 AND 
dr7phot.objID=xdr7. slaveObjID AND photz. objID= 
dr7phot.objID AND dr7phot.objID NOT IN ( 

SELECT dr7spec.objID 

FROM BESTDR7 . . SpecPhotoAll as dr7spec 

WHERE dr7spec.objID=xdr7.slave0bjID AND 

dr7spec . sciencePrimary=l) AND 

/* select VIKING primary sources matched to 

SDSS primary sources */ 

(priOrSec=0 OR priOrSec=frameSetID) AND 



sdssPrimary=l AND 
/* within 2" of nearest match */ 
xdr7.distanceMins<<9. 03333 AND 
xdr7.distanceMins IN ( 
SELECT MIN(distanceMins) 
FROM vikingSourceXDR7Photo0bjAll 
WHERE masterObjID=xdr7.masterObjID AND 
sdssPrimary=l) AND 

/* for objects classified as galaxies or 
probable galaxies in VIKING */ 
mergedClass IN (1,-3) AND 
/* h and ks size is 0.7<sma<=10. arcsec */ 
ksHlCorSMjRadAs><8.7 AND hHlCorSMjRadAs>0 . 7 AND 
ksHlCorSMjRadAs<=10 . AND hHlCorSMjRadAs<=10.0 
/* good quality data in VIKING h and ks */ 
AND hppErrBits=0 AND ksppErrBits=0 AND 
/* ks extinction corrected AB mag < 2®. 5 */ 
(ksPetroMag-aKs+fks.VegaToAB)<20. 5 AND 
/* ra and dec range to restrict to SDSS */ 
s.ra>100. AND s.ra<250. AND s.dec>-5. AND 
/* z>=<8.<9®2 */ 
photz. z>=0. 002 
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Fig. 9. Ks-band AB Petrosian magnitude versus redshift. This 
plot shows very few redshifts greater than z = 1, shown in red. 
The galaxies with z > 1 are likely to have spectroscopic red- 
shifts, whereas the photometric redshifts are limited to < z < 1 . 
Galaxies with Ksab < 18.2 tend to have z < 1, so we have se- 
lected a sample in TOPCAT which have Ks < 18.2 and z < 1, 
which are shown in green. 
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Fig. 10. Ks-band AB Petrosian magnitude versus u-g model mag 
colour for the Ksab < 18.2,z < 1 sample selected in Fig|9] At the 
bright end, the red-sequence is clear, but at fainter magnitudes, 
galaxies will be at higher redshift, so the observed colours are 
less meaningful. 



We plot the magnitude against the redshift for these galax- 
ies, Fig [9] and find that there is an artificial selection at z < 1 . 
The most likely explanation is that the photometric redshifts are 
limited to this range, since the SDSS optical colours do not give 
reliable photometric redshifts outside this range. By selecting a 
subsample at K s < 18.2 and z < 1, we have a more complete 
sample with reliable redshifts. We use this sample to look at the 
colour-magnitude plot and the surface brightness magnitude plot 
of galaxies, Figs [10] & [TT] The surface brightness, colour, mag- 
nitude and redshift are all fundamental for classifying galaxies. 
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Fig. 11. Ks-band AB Petrosian magnitude versus effective Ks 
surface brightness for the Ksab < 18.2, z < 1 sample selected 
in Fig [9] The hard limit at the top-right hand side shows the size 
limit of 0.7". The main galaxy population seems to have a high 
surface brightness limit of fi^ s = 18. mag arcsec~ 2 , with a small 
group at higher surface brightnesses (shown as blue crosses) - 
either compact galaxies or stars that have managed to avoid all 
the selection criteria. Galaxies have surface brightnesses as low 
as j^Ks = 24.0 mag arcsec~ 2 , but fainter galaxies would need 
to be selected before the VIKING surface brightness became a 
limiting factor. 

9.4. Extragalactic variables in VIDEO 

In deep extragalactic surveys, such as VIDEO, with many 
epochs over several months or years, it is possible to find a range 
of AGN, and very occasionally supernovae. There are also a few 
foreground stars that show variability. We select point-source 
variables in the VIDEO survey which show a range in magni- 
tudes greater than 0. 1 mag in any filter. Since the filters in this 
survey are not taken simultaneously, and AGN show sporadic 
variability, sometimes variations may only be seen in one filter. 

SELECT s . sourcelD , s . ra , s . dec , v . f rameSetID , 
v . zMedianMag , v . zMagRms , v . znGoodObs , v . zSkewness , 
(v.zMaxMag-v.zMinMag) AS zRange,v.yMedianMag, 
v . yMagRms , v . ynGoodObs , v . ySkewness , 
(v.yMaxMag-v.yMinMag) AS yRange,v. jMedianMag, 

v . jMagRms , v . jnGoodObs , v . j Skewness , 

(v. jMaxMag-v. jMinMag) AS jRange,v.hMedianMag, 
v . hMagRms , v . hnGoodObs , v . hSkewness , 
Cv.hMaxMag-v.hMinMag) AS hRange,v.ksMedianMag, 
v . ksMagRms , v . ksnGoodObs , v . ksSkewness , 
(v.ksMaxMag-v.ksMinMag) AS ksRange 
FROM videoVariability AS v, videoSource AS 
s /* join tables */ 
WHERE v. sourceID=s. sourcelD AND 
/* point source variables */ 

s.mergedClass IN (-1,-2) AND 
v. variableClass=l AND 

/* delta mag in > (8.1 in ANY filter, with 
at least 5 good obs in that filter */ 
C((zMaxMag-zMinMag)>(5. 1 AND zMinMag>Q. 
AND znGood0bs>=5) OR C(yMaxMag-yMinMag)>8 . 1 
AND yMinMag>0. AND ynGood0bs>=5) OR 
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Range (max-min) in Ks band vs H band 
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Fig. 12. Ks-band magnitude range versus H-band magnitude 
range. While there is a correlation between the two ranges, the 
H-band range seems greater than the Ks-band on average. There 
are some objects which have no discernible variation in Ks or H, 
but do so in one or more of the other filters. 

(CjMaxMag-jMinMag)>0. 1 AND jMinMag>8. AND 
jnGood0bs>=5) OR C(hMaxMag-hMinMag)>0. 1 
AND hMinMag>0. AND hnGood0bs>=5) OR 
C(ksMaxMag-ksMinMag)>(S. 1 AND ksMinMagXJ. 
AND ksnGood0bs>=5)) 

We can plot some of the variability statistics, such as the 
range in the K s band against the range in the H band, see Fig [12] 
or the RMS against the skewness in the K s band, Fig [13] These 
types of plots help to classify different types of variable and to 
pick out odd objects. 

We can then select one of these objects, e.g. the object in 
Fig[13]from pointing 1, with a K s band RMS > 0.4 mag, which 
is more than twice the RMS of any of the other objects and plot 
the light curve. To do this, we do a second query, below: 

SELECT 

/* Select time, filter, magnitude, magnitude 
error and flags */ 

d . m j d , d . f i IterlD , d . aperMag 3 , d . aperMag 3Err , 

d.ppErrBits 
/* From BestMatch table to link all 

observations of the same source and 
videoDetection for each observation */ 

FROM videoDetection as d, 
videoSourceXDetectionBestMatch as b 

/* First join tables */ 

WHERE b.multiframeID=d.multiframeID AND 
b . extNum=d. extNum AND b. seqNum=d. seqNum 

/* then select only detections and sourcelD 

equal to object in previous selection 
which has a Ks-band RMSXJ.4 mag */ 
AND d.seqNum>0 AND b.sourcelD IN ( 
SELECT s.sourcelD 

FROM videoVariability AS v, videoSource AS s 

/* join tables */ 

WHERE v.sourceID=s.sourceID AND 
/* point source variables */ 



Fig. 13. Ks-band Skewness versus RMS for objects with at least 
5 good Ks-band observations. These have been split into the 
two pointings (pointing 1, frame SetID= 644245 094401; pointing 
2=644245094402), which show similarities, at least for rms< 
0.2 mag. Most of the objects selected have positive skews in the 
Ks-band. The skew decreases as the RMS increases at rms< 0.2 
mag, although the reason for this is not clear. 

s.mergedClass IN (-1,-2) AND 
v. variableClass=l AND 

/* delta mag in > 0.1 in ANY filter, with 
at least 5 good obs in that filter */ 

(((zMaxMag-zMinMag)>0. 1 AND zMinMag>8. 
AND znGood0bs>=5) OR C(yMaxMag-yMinMag)>Q . 1 
AND yMinMag>8. AND ynGood0bs>=5) OR 

((_MaxMag-jMinMag)>8. 1 AND jMinMag>®. AND 

jnGood0bs>=5) OR C(hMaxMag-hMinMag)>0. 1 AND 
hMinMag><5. AND hnGood0bs>=5) OR 

((ksMaxMag-ksMinMag)XJ. 1 AND ksMinMag>®. AND 
ksnGood0bs>=5)) 

/* Ks-band RMS >S.4 mag */ 

AND ksMagRms><5.4 AND s . frameSetID=644245094401) 
/* order by time */ 
ORDER BY d.mjd 

We can use TOPCAlQ to plot the light-curve, see Fig [14] 
The light-curve is very interesting, showing a short phase of 
brightening followed by a longer phase of fading, characteris- 
tic of an exploding star, probably a Type la SNa, with a maxi- 
mum brightness of Z = 1 8.5 mag. The position and time of this 
object match SN2010gy (IChornock et al.ll2010h . The discovery 
team found that it has a redshift, z = 0.06, but could not find a 
likely host galaxy. The thumbnail of this source from the deep 
K s band mosaic is shown in Fig [15] 

10. First Public Data Releases 

The first public releases of VISTA Public Survey data through 
the VSA are intended to match the DR1/2 datasets published for 
each survey in the ESO SAtF^l Thus, they will cover the data 

17 http ://w w w. star. bris . ac . uk/ ~ mbt/topcat/ 

18 http://www.eso.org/sci/observing/phase3/data_releases.html 
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Fig. 14. Light-curve of variable selected in text. This variable 
brightens by ~ 2 magnitudes in less than 10 days and then fades 
by almost 5 magnitudes over the next 100 days. This is the ex- 
pected behaviour of a supernova Type la. 
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Fig. 15. Thumbnail of point-source variable which has light 
curve shown in Fig [14] Thumbnails can be shown by selecting 
the attributes ra, dec and frameSetID from the Source table and 
clicking on the supplied link. 



up to the end of ESO semester P85 (i.e. up to 30th September 
2010). Some of the surveys have released data from P86 as well, 
but with the following additional constraints: 

- VMC: the data released will only be those fields where the 
whole set of epochs is complete, i.e. the 30 Doradus field 
(5h37m40s,-69° 22' 18") and the Gaia South Elliptical Pole 
field (5h59m23s,-66°20'28"). 

- VIKING: the data will only be released in the following 
fields: GAMA09 (33 fields overlapping with the Galaxy 



and Mass Assembly 09h field; GAMA, iDriver et alJl201 lh . 
CFHTLS-W1 (6 fields overlappin g w ith the Canada France 
Hawaii Legacy Survey Wl fielq3 an d 9 fields in the 
Southern Galactic Pole region. 
- VIDEO: the data will only be released in the pointings 
which VIDEO mosaics have been created in (XMM3 field, 
2h26ml8s, -4°44'; ESI -North field, 0h37m49s, -43°30'). 

We have cropped all the tables in the surveys to match the 
pointings specified by the Pis. The excluded data will be re- 
leased in future releases. Table Q] summarises the contents of 
the releases. Unlike UKIDSS each survey will be released into 
a separate database. The sixth public survey Ultra VISTA is not 
using VDFS processing, except for the initial pawprint pipeline 
calibration, so we are not releasing data from this survey. 

1 1 . Summary and future work 

The VSA was designed as the main access point to all VISTA 
science data, allowing users to carefully select the data they 
need, rather than to bulk download all the data, a difficult and 
time consuming job in the era of billion row catalogues. As we 
have shown in the Illustrated Examples (§ |9}, the VSA is de- 
signed to allow users to select on a wide range of attributes 
and to work with external data, such as the Sloan Digital Sky 
Survey, WISE, Glimpse, OGLE etc. The VSA is based on the 
WFCAM Science Archive but has VISTA specific features and 
various improvements based on our experience of WFCAM data 
and archive processing. 

In the future we plan several enhancements. In the near fu- 
ture, we are workin g on impr ovements to our interface, including 
a MyDB dLi & Thakarl20 08) style access, where users can com- 
bine queries with Python scripts, to produce a powerful work 
environment. We are also improving our plotting tools to more 
easily show very large datasets with a combination of density 
maps where the number of points is huge, and individual points 
where the density drops below a threshold. 
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Appendix A: Image scaling 

For single epoch OB stacks, 

b scale - , (A.l) 

NDIT y/NJITTER 

and for tiles, 

b scale = (A. 2) 

NDIT y/2N JITTER 

where NDIT is the number of readouts during the integration 
of a raw image and N JITTER is the number of jitter positions 
in the jitter pattern to create a single pawprint stack. For deep 
stacks, we would scale by the number of epochs, in the same 
way as the number of jitters, but since NDIT and N JITTER 
can vary from stack to stack in the same programme, pointing 
and filter, we calculate bscale for deep stacks as: 

bscale deep = 1 (A.3) 

-y^' bscale; 

For deep tiles, to take account of different integration times 
in each offset and deprecated detectors in some OB stacks, we 
compute the bscale values for each detector as above in each 
overlap and average over all overlaps, just as we calculate the 
total exposure time for a tile. 

Appendix B: Half-light radii 

The sizes of extended sources are difficult to measure for various 
reasons: 

- The outer parts of a galaxy eventually merge into the sky 
background, so it is difficult to know how much of the galaxy 
is lost in the sky. For intrinsically low surface brightness 
galaxies or high redshift galaxies, the majority of the light 
may well be lost in the bac kground and any measurement is 
a significant underestimate dDisnevlll976t ICross et al]|200U 
e-g-) 
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- The profile is not always smooth or axisymmetrical, e.g. 
grand design spirals, irregular galaxies, interacting galaxies. 

- Nearby objects can make it difficult to get accurate measure- 
ments of the total luminosity and extent of a galaxy. The 
measurement of the background level is sometimes incorrect 
and this can affect a curve-of-growth measurement. 

- Galaxies have various inclinations to the line of sight. 
Different researchers may want to use measurements that 
correct for inclination or do not. 

- The light of galaxies and all objects is convolved with a 
point-spread function that will particularly effect small ob- 
jects. 

- The method must be robust enough and quick enough to 
be applied to the VISTA detection tables. We exclude the 
VVV and VMC since these are in extremely dense regions 
where contamination from nearby objects is almost guaran- 
teed and the vast majority of sources are stellar, which are 
point-sources. Even so, the catalogues with size measure- 
ments will contain > 10 8 sources, and maybe 10 9 sources. 

Our measurements of the size of galaxies try to take into 
account all the above effects as much as possible. We define 
our basic size measurement as the radius containing half of the 
flux of the galaxy - the half-light -radius - a measurement that 
has been used extensively before dKormendvll 1 9771: ICross et all 
1200 U iBlanton et ail 1200 lb . The main difficulty with this mea- 
surement is measuring the total flux. 

To take into account the missing light from the outer parts 
of the galaxy, we u se the Petrosian flux (IBlanton et al l 12001; 
iGrahamet ai1l2005l) . which is generally insensitive to the effects 
of surface brightness, i.e. if you keep the galaxy profile the same 
(the relative flux as a function of radius) but reduce or increase 
the average surface-brightness, closer to or further from the sky 
noise value, then the Petrosian flux measurement will return the 
same flux each time. This breaks down eventually: if you re- 
duce the surface brightness enough the galaxy won't even be 
detectable against the sky, and close to this limit the total flux 
and size will become difficult to measure with any accuracy. 

The Petrosian flux however gi ves different result s for dif- 
ferent profiles, which is an issue. IBlanton et al. showed 
that while only 0.7% of the flux of an exponential disk galaxy 
was typically missed by the Petrosian, 22% of the flux of a 
de Vaucouleurs' profile elliptical galaxy was missed, and 5% 
of the flux of a PSF do minated profile was missed, although 
iGraham & Driver! ([2005) shows that there are slightly different 
results for a standard Petrosian definition compared to the SDSS 
Petrosian that Blanton used. Small galaxies, close to the seeing 
limit will be dominated by the point-spread function. Galaxies 
close to the surface-brightness limit of the survey could be miss- 
ing much more of the light. To try to take into account the miss- 
ing light, we assume that all galaxies are missing 10%. This will 
be an overestimate in some cases and an underestimate in others, 
but to try and calculate a correction for each galaxy would re- 
quire an iterative procedure which would take much longer, and 
in any case, it w ould be better to fit profiles for all objects (e.g. 
Peng et al. 2002|). The light profiles of galaxies are often well fit 
by Sersic profiles dGraham et al.l2005l) . a more general function, 
that includes exponential (J3 = 1), de Vaucouleur (J3 = 4) and 
Gaussian (yS = 0.5), 



I(r) = I ru exp { -k 
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Fig. B.l. Figure of the ellipticity versus ratio of half-light semi- 
major axis to half-light radius for different Sersic profiles. The 
lines are the best fit Moffat profiles in each case. 



light r adii, rather than going back to the images (e.g. lLiske et al.l 
2003). The half-light radii are calculated using the existing circu- 
lar aperture radii measurements of the flux, which give a curve- 
of-growth. We use the 13 aperture fluxes measured by the VDFS 
extractor, at radii of 0.5, 0.5 V5, 1 , V2, 2, 2 V2, 4, 5, 6, 7, 8, 10, 12 
arcseconds. To calculate the half-light radius, we first find the 
aperture flux closest to half the total-flux and then use the five 
apertures centred on this (2 before, 2 after and the aperture 
in question). Using these 5 aperture fluxes, we fit a quadratic, 
which removes any small bumps in the curve, using the s ingu- 
lar value decomposition method dGolub & Reinsch| |T970). We 
find the root of the quadratic that gives the half-light radius (ff,, 
hlCircRadAs) . We use the covariance matrix to calculate the 
error in the half-light radius (ay , MCircRadErrAs), adding in 
another half -pixel in quadrature, to take into account the granu- 
larity of the data. 

The 13 aperture fluxes are all circular apertures, so the half- 
light radius calculated assumes a circular symmetry. However, 
most galaxies are elliptical in shape, either being triaxial 
spheroidal systems or inclined disks or a combination of 
the two. A geometric mean size is usually considered a 
more useful measurement for triaxial elliptical galaxies (e.g. 
see Binnev & Merrifield] 1 19981) . and a semi-major axis size 
dDriver et al.ll2005l) . which recovers the radius of the disk what- 
ever the inclination, is more useful for disk galaxies. Figure lETTI 
shows the ratio of half-light semi-major axis to half-light circu- 
lar radius as a function of ellipticity for a range of Sersic profiles 
(J3 = 0.5 to p = 7): profiles which are a good fit to the vast 
majority of virialised galaxies. As can be seen, the variation be- 
tween these profiles is around 1 - 2% for all ellipticities < 0.9, 
but rises to 10% at maximum ellipticity. These curves are well 
fit by Moffat functions, 



(i + ( V) 2 > 



(B.2) 



(B.l) where a, b & c are found from fitting the data for each profile. 

Thus, we can convert our circular half-light radii r, c , to a 



where r^i is the half-light radius, /? is the Sersic index. To save 
time, we use the existing catalogue products to calculate the half- 



semi-major axis size rf? J by using the f3 — 2 function (a = 
1.8243, b = 0.30914, c = 0.24304). We choose /? = 2, since 
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most galaxies will be ellipticals (J3 — 1), de Vaucouleurs (J3 = 4), 
or dominated by the PSF (J3 = 0.5), and /3 = 2 is nicely in the 
middle, but as Fig lB.ll shows. there is very little difference. 

The conversions to the half-light semi-minor axis, rff™, and 
half-light geometric mean, rfj", are easily computed from the 
geometry of an ellipse: 

rj;r = (1 - e)r s J j (B.3) 
rf = J r sm " r smj (B 4) 

'hi \'hl 'hi \ D -^> 

Finally , we take into accou nt the effects of seeing. We use the 
method of Driv er et al. to subtract the measured seeing, 

assuming a Gaussian PSF, 

smj,see / sm fl tv> m ^ 

where Y is the full-width half maximum of stars in the image 
and c see is a constant, 0.5 for a Gaussian PSF. By experiment we 
found values of c see ~ 0.45, but with quite a large uncertainty, so 
we took the theoretical value c see = 0.5. 
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